- 19 9月, 2020 9 次提交
-
-
由 Randy Dunlap 提交于
Drop repeated words in net/rds/. Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: linux-rdma@vger.kernel.org Cc: rds-devel@oss.oracle.com Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Randy Dunlap 提交于
Drop repeated words in net/core/. Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tuong Lien 提交于
Rekeying is required for security since a key is less secure when using for a long time. Also, key will be detached when its nonce value (or seqno ...) is exhausted. We now make the rekeying process automatic and configurable by user. Basically, TIPC will at a specific interval generate a new key by using the kernel 'Random Number Generator' cipher, then attach it as the node TX key and securely distribute to others in the cluster as RX keys (- the key exchange). The automatic key switching will then take over, and make the new key active shortly. Afterwards, the traffic from this node will be encrypted with the new session key. The same can happen in peer nodes but not necessarily at the same time. For simplicity, the automatically generated key will be initiated as a per node key. It is not too hard to also support a cluster key rekeying (e.g. a given node will generate a unique cluster key and update to the others in the cluster...), but that doesn't bring much benefit, while a per-node key is even more secure. We also enable user to force a rekeying or change the rekeying interval via netlink, the new 'set key' command option: 'TIPC_NLA_NODE_REKEYING' is added for these purposes as follows: - A value >= 1 will be set as the rekeying interval (in minutes); - A value of 0 will disable the rekeying; - A value of 'TIPC_REKEYING_NOW' (~0) will force an immediate rekeying; The default rekeying interval is (60 * 24) minutes i.e. done every day. There isn't any restriction for the value but user shouldn't set it too small or too large which results in an "ineffective" rekeying (thats ok for testing though). Acked-by: NJon Maloy <jmaloy@redhat.com> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tuong Lien 提交于
With support from the master key option in the previous commit, it becomes easy to make frequent updates/exchanges of session keys between authenticated cluster nodes. Basically, there are two situations where the key exchange will take in place: - When a new node joins the cluster (with the master key), it will need to get its peer's TX key, so that be able to decrypt further messages from that peer. - When a new session key is generated (by either user manual setting or later automatic rekeying feature), the key will be distributed to all peer nodes in the cluster. A key to be exchanged is encapsulated in the data part of a 'MSG_CRYPTO /KEY_DISTR_MSG' TIPC v2 message, then xmit-ed as usual and encrypted by using the master key before sending out. Upon receipt of the message it will be decrypted in the same way as regular messages, then attached as the sender's RX key in the receiver node. In this way, the key exchange is reliable by the link layer, as well as security, integrity and authenticity by the crypto layer. Also, the forward security will be easily achieved by user changing the master key actively but this should not be required very frequently. The key exchange feature is independent on the presence of a master key Note however that the master key still is needed for new nodes to be able to join the cluster. It is also optional, and can be turned off/on via the sysfs: 'net/tipc/key_exchange_enabled' [default 1: enabled]. Backward compatibility is guaranteed because for nodes that do not have master key support, key exchange using master key ie. tx_key = 0 if any will be shortly discarded at the message validation step. In other words, the key exchange feature will be automatically disabled to those nodes. v2: fix the "implicit declaration of function 'tipc_crypto_key_flush'" error in node.c. The function only exists when built with the TIPC "CONFIG_TIPC_CRYPTO" option. v3: use 'info->extack' for a message emitted due to netlink operations instead (- David's comment). Reported-by: Nkernel test robot <lkp@intel.com> Acked-by: NJon Maloy <jmaloy@redhat.com> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tuong Lien 提交于
In addition to the supported cluster & per-node encryption keys for the en/decryption of TIPC messages, we now introduce one option for user to set a cluster key as 'master key', which is simply a symmetric key like the former but has a longer life cycle. It has two purposes: - Authentication of new member nodes in the cluster. New nodes, having no knowledge of current session keys in the cluster will still be able to join the cluster as long as they know the master key. This is because all neighbor discovery (LINK_CONFIG) messages must be encrypted with this key. - Encryption of session encryption keys during automatic exchange and update of those.This is a feature we will introduce in a later commit in this series. We insert the new key into the currently unused slot 0 in the key array and start using it immediately once the user has set it. After joining, a node only knowing the master key should be fully communicable to existing nodes in the cluster, although those nodes may have their own session keys activated (i.e. not the master one). To support this, we define a 'grace period', starting from the time a node itself reports having no RX keys, so the existing nodes will use the master key for encryption instead. The grace period can be extended but will automatically stop after e.g. 5 seconds without a new report. This is also the basis for later key exchanging feature as the new node will be impossible to decrypt anything without the support from master key. For user to set a master key, we define a new netlink flag - 'TIPC_NLA_NODE_KEY_MASTER', so it can be added to the current 'set key' netlink command to specify the setting key to be a master key. Above all, the traditional cluster/per-node key mechanism is guaranteed to work when user comes not to use this master key option. This is also compatible to legacy nodes without the feature supported. Even this master key can be updated without any interruption of cluster connectivity but is so is needed, this has to be coordinated and set by the user. Acked-by: NJon Maloy <jmaloy@redhat.com> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tuong Lien 提交于
We reduce the lasting time for a pending TX key to be active as well as for a passive RX key to be freed which generally helps speed up the key switching. It is not expected to be too fast but should not be too slow either. Also the key handling logic is simplified that a pending RX key will be removed automatically if it is found not working after a number of times; the probing for a pending TX key is now carried on a specific message user ('LINK_PROTOCOL' or 'LINK_CONFIG') which is more efficient than using a timer on broadcast messages, the timer is reserved for use later as needed. The kernel logs or 'pr***()' are now made as clear as possible to user. Some prints are added, removed or changed to the debug-level. The 'TIPC_CRYPTO_DEBUG' definition is removed, and the 'pr_debug()' is used instead which will be much helpful in runtime. Besides we also optimize the code in some other places as a preparation for later commits. v2: silent more kernel logs, also use 'info->extack' for a message emitted due to netlink operations instead (- David's comments). Acked-by: NJon Maloy <jmaloy@redhat.com> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shannon Nelson 提交于
The dev flash status notify function parameter lists are getting rather long, so add a struct to be filled and passed rather than continuously changing the function signatures. Signed-off-by: NShannon Nelson <snelson@pensando.io> Reviewed-by: NJacob Keller <jacob.e.keller@intel.com> Reviewed-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shannon Nelson 提交于
Add a timeout element to the DEVLINK_CMD_FLASH_UPDATE_STATUS netlink message for use by a userland utility to show that a particular firmware flash activity may take a long but bounded time to finish. Also add a handy helper for drivers to make use of the new timeout value. UI usage hints: - if non-zero, add timeout display to the end of the status line [component] status_msg ( Xm Ys : Am Bs ) using the timeout value for Am Bs and updating the Xm Ys every second - if the timeout expires while awaiting the next update, display something like [component] status_msg ( timeout reached : Am Bs ) - if new status notify messages are received, remove the timeout and start over Signed-off-by: NShannon Nelson <snelson@pensando.io> Reviewed-by: NJakub Kicinski <kuba@kernel.org> Reviewed-by: NJacob Keller <jacob.e.keller@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francesco Ruggeri 提交于
The combination of aca_free_rcu, introduced in commit 2384d025 ("net/ipv6: Add anycast addresses to a global hashtable"), and fib6_info_destroy_rcu, introduced in commit 9b0a8da8 ("net/ipv6: respect rcu grace period before freeing fib6_info"), can result in an extra rcu grace period being needed when deleting an interface, with the result that netdev_wait_allrefs ends up hitting the msleep(250), which is considerably longer than the required grace period. This can result in long delays when deleting a large number of interfaces, and it can be observed with this script: ns=dummy-ns NIFS=100 ip netns add $ns ip netns exec $ns ip link set lo up ip netns exec $ns sysctl net.ipv6.conf.default.disable_ipv6=0 ip netns exec $ns sysctl net.ipv6.conf.default.forwarding=1 for ((i=0; i<$NIFS; i++)) do if=eth$i ip netns exec $ns ip link add $if type dummy ip netns exec $ns ip link set $if up ip netns exec $ns ip -6 addr add 2021:$i::1/120 dev $if done for ((i=0; i<$NIFS; i++)) do if=eth$i ip netns exec $ns ip link del $if done ip netns del $ns Instead of using a fixed msleep(250), this patch tries an extra rcu_barrier() followed by an exponential backoff. Time with this patch on a 5.4 kernel: real 0m7.704s user 0m0.385s sys 0m1.230s Time without this patch: real 0m31.522s user 0m0.438s sys 0m1.156s v2: use exponential backoff instead of trying to wake up netdev_wait_allrefs. v3: preserve reverse christmas tree ordering of local variables v4: try an extra rcu_barrier before the backoff, plus some cosmetic changes. Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 9月, 2020 6 次提交
-
-
由 Paolo Abeni 提交于
Christoph reported an infinite loop in the subflow receive path under stress condition. If there are multiple subflows, each of them using a large send buffer, the delta between the sequence number used by MPTCP-level retransmission can and the current msk->ack_seq can be greater than MAX_INT. In the above scenario, when calling mptcp_subflow_discard_data(), such delta will be truncated to int, and could result in a negative number: no bytes will be dropped, and subflow_check_data_avail() will try again to process the same packet, looping forever. This change addresses the issue by expanding the 'limit' size to 64 bits, so that overflows are not possible anymore. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/87 Fixes: 6719331c ("mptcp: trigger msk processing even for OoO data") Reported-and-tested-by: NChristoph Paasch <cpaasch@apple.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ursula Braun 提交于
If smc_listen_rmda_finish() returns with an error, the storage addressed by 'buf' is freed a second time. Consolidate freeing under a common label and jump to that label. Fixes: 6bb14e48 ("net/smc: dynamic allocation of CLC proposal buffer") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Qinglang Miao 提交于
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: NQinglang Miao <miaoqinglang@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yang Yingliang 提交于
It's hard to read the code without spaces around '&', for better reading, add spaces around '&'. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ye Bin 提交于
Fixes coccicheck warnig: net/mptcp/protocol.c:164:11-18: WARNING: Unsigned expression compared with zero: max_seq > 0 Fixes: ab174ad8 ("mptcp: move ooo skbs into msk out of order queue") Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NYe Bin <yebin10@huawei.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xie He 提交于
1. Change all "dev->hard_header" to "dev->header_ops" 2. On receiving incoming frames when header_ops == NULL: The comment only says what is wrong, but doesn't say what is right. This patch changes the comment to make it clear what is right. 3. On transmitting and receiving outgoing frames when header_ops == NULL: The comment explains that the LL header will be later added by the driver. However, I think it's better to simply say that the LL header is invisible to us. This phrasing is better from a software engineering perspective, because this makes it clear that what happens in the driver should be hidden from us and we should not care about what happens internally in the driver. 4. On resuming the LL header (for RAW frames) when header_ops == NULL: The comment says we are "unlikely" to restore the LL header. However, we should say that we are "unable" to restore it. It's not possible (rather than not likely) to restore it, because: 1) There is no way for us to restore because the LL header internally processed by the driver should be invisible to us. 2) In function packet_rcv and tpacket_rcv, the code only tries to restore the LL header when header_ops != NULL. Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: NXie He <xie.he.0141@gmail.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 9月, 2020 2 次提交
-
-
由 Karsten Graul 提交于
smc->clcsock and smc->clcsock->sk are used before the check if they can be dereferenced. Fix this by checking the variables first. Fixes: a60a2b1e ("net/smc: reduce active tcp_listen workers") Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
When we're handling TO_EXCLUDE report in EXCLUDE filter mode we should not ignore the return value of __grp_src_toex_excl() as we'll miss sending notifications about group changes. Fixes: 5bf1e00b ("net: bridge: mcast: support for IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 9月, 2020 5 次提交
-
-
由 Ido Schimmel 提交于
Currently, the in-kernel delete notification is emitted from the error path of nexthop_add() and replace_nexthop(), which can be confusing to in-kernel listeners as they are not familiar with the nexthop. Instead, only emit the notification when the nexthop is actually deleted. The following sub-cases are covered: 1. User space deletes the nexthop 2. The nexthop is deleted by the kernel due to a netdev event (e.g., nexthop device going down) 3. A group is deleted because its last nexthop is being deleted 4. The network namespace of the nexthop device is deleted Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
Currently, the only listener of the nexthop notification chain is the VXLAN driver. Subsequent patches will add more listeners (e.g., device drivers such as netdevsim) that need to be able to block when processing notifications. Therefore, convert the notification chain to a blocking one. This is safe as notifications are always emitted from process context. Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Introduce a test command for health reporters. User might use this command to trigger test event on a reporter if the reporter supports it. Signed-off-by: NJiri Pirko <jiri@nvidia.com> Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Currently drivers have to report their pause frames statistics via ethtool -S, and there is a wide variety of names used for these statistics. Add the two statistics defined in IEEE 802.3x to the standard API. Create a new ethtool request header flag for including statistics in the response to GET commands. Always create the ETHTOOL_A_PAUSE_STATS nest in replies when flag is set. Testing if driver declares the op is not a reliable way of checking if any stats will actually be included and therefore we don't want to give the impression that presence of ETHTOOL_A_PAUSE_STATS indicates driver support. Note that this patch does not include PFC counters, which may fit better in dcbnl? But mostly I don't need them/have a setup to test them so I haven't looked deeply into exposing them :) v3: - add a helper for "uninitializing" stats, rather than a cryptic memset() (Andrew) Signed-off-by: NJakub Kicinski <kuba@kernel.org> Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexandra Winter 提交于
so the switchdev can notifiy the bridge to flush non-permanent fdb entries for this port. This is useful whenever the hardware fdb of the switchdev is reset, but the netdev and the bridgeport are not deleted. Note that this has the same effect as the IFLA_BRPORT_FLUSH attribute. CC: Jiri Pirko <jiri@resnulli.us> CC: Ivan Vecera <ivecera@redhat.com> CC: Roopa Prabhu <roopa@nvidia.com> CC: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: NAlexandra Winter <wintera@linux.ibm.com> Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com> Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com> Acked-by: NIvan Vecera <ivecera@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 9月, 2020 18 次提交
-
-
由 Soheil Hassas Yeganeh 提交于
For EPOLLET, applications must call sendmsg until they get EAGAIN. Otherwise, there is no guarantee that EPOLLOUT is sent if there was a failure upon memory allocation. As a result on high-speed NICs, userspace observes multiple small sendmsgs after a partial sendmsg until EAGAIN, since TCP can send 1-2 TSOs in between two sendmsg syscalls: // One large partial send due to memory allocation failure. sendmsg(20MB) = 2MB // Many small sends until EAGAIN. sendmsg(18MB) = 64KB sendmsg(17.9MB) = 128KB sendmsg(17.8MB) = 64KB ... sendmsg(...) = EAGAIN // At this point, userspace can assume an EPOLLOUT. To fix this, set the SOCK_NOSPACE on all partial sendmsg scenarios to guarantee that we send EPOLLOUT after partial sendmsg. After this commit userspace can assume that it will receive an EPOLLOUT after the first partial sendmsg. This EPOLLOUT will benefit from sk_stream_write_space() logic delaying the EPOLLOUT until significant space is available in write queue. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Soheil Hassas Yeganeh 提交于
If there was any event available on the TCP socket, tcp_poll() will be called to retrieve all the events. In tcp_poll(), we call sk_stream_is_writeable() which returns true as long as we are at least one byte below notsent_lowat. This will result in quite a few spurious EPLLOUT and frequent tiny sendmsg() calls as a result. Similar to sk_stream_write_space(), use __sk_stream_is_writeable with a wake value of 1, so that we set EPOLLOUT only if half the space is available for write. Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
A DSA master interface has upper network devices, each representing an Ethernet switch port attached to it. Demultiplexing the source ports and setting skb->dev accordingly is done through the catch-all ETH_P_XDSA packet_type handler. Catch-all because DSA vendors have various header implementations, which can be placed anywhere in the frame: before the DMAC, before the EtherType, before the FCS, etc. So, the ETH_P_XDSA handler acts like an rx_handler more than anything. It is unlikely for the DSA master interface to have any other upper than the DSA switch interfaces themselves. Only maybe a bridge upper*, but it is very likely that the DSA master will have no 8021q upper. So __netif_receive_skb_core() will try to untag the VLAN, despite the fact that the DSA switch interface might have an 8021q upper. So the skb will never reach that. So far, this hasn't been a problem because most of the possible placements of the DSA switch header mentioned in the first paragraph will displace the VLAN header when the DSA master receives the frame, so __netif_receive_skb_core() will not actually execute any VLAN-specific code for it. This only becomes a problem when the DSA switch header does not displace the VLAN header (for example with a tail tag). What the patch does is it bypasses the untagging of the skb when there is a DSA switch attached to this net device. So, DSA is the only packet_type handler which requires seeing the VLAN header. Once skb->dev will be changed, __netif_receive_skb_core() will be invoked again and untagging, or delivery to an 8021q upper, will happen in the RX of the DSA switch interface itself. *see commit 9eb8eff0 ("net: bridge: allow enslaving some DSA master network devices". This is actually the reason why I prefer keeping DSA as a packet_type handler of ETH_P_XDSA rather than converting to an rx_handler. Currently the rx_handler code doesn't support chaining, and this is a problem because a DSA master might be bridged. Signed-off-by: NVladimir Oltean <olteanv@gmail.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
flush_all_backlogs() may cause deadlock on systems running processes with FIFO scheduling policy. The above is critical in -RT scenarios, where user-space specifically ensure no network activity is scheduled on the CPU running the mentioned FIFO process, but still get stuck. This commit tries to address the problem checking the backlog status on the remote CPUs before scheduling the flush operation. If the backlog is empty, we can skip it. v1 -> v2: - explicitly clear flushed cpu mask - Eric Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
SOCK_QUEUE_SHRUNK is currently used by TCP as a temporary state that remembers if some room has been made in the rtx queue by an incoming ACK packet. This is later used from tcp_check_space() before considering to send EPOLLOUT. Problem is: If we receive SACK packets, and no packet is removed from RTX queue, we can send fresh packets, thus moving them from write queue to rtx queue and eventually empty the write queue. This stall can happen if TCP_NOTSENT_LOWAT is used. With this fix, we no longer risk stalling sends while holes are repaired, and we can fully use socket sndbuf. This also removes a cache line dirtying for typical RPC workloads. Fixes: c9bee3b7 ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xie He 提交于
This comment is outdated and no longer reflects the actual implementation of af_packet.c. Reasons for the new comment: 1. In af_packet.c, the function packet_snd first reserves a headroom of length (dev->hard_header_len + dev->needed_headroom). Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, which calls dev->header_ops->create, to create the link layer header. If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of length (dev->hard_header_len), and checks if the user has provided a header sized between (dev->min_header_len) and (dev->hard_header_len) (in dev_validate_header). This shows the developers of af_packet.c expect hard_header_len to be consistent with header_ops. 2. In af_packet.c, the function packet_sendmsg_spkt has a FIXME comment. That comment states that prepending an LL header internally in a driver is considered a bug. I believe this bug can be fixed by setting hard_header_len to 0, making the internal header completely invisible to af_packet.c (and requesting the headroom in needed_headroom instead). 3. There is a commit for a WiFi driver: commit 9454f7a8 ("mwifiex: set needed_headroom, not hard_header_len") According to the discussion about it at: https://patchwork.kernel.org/patch/11407493/ The author tried to set the WiFi driver's hard_header_len to the Ethernet header length, and request additional header space internally needed by setting needed_headroom. This means this usage is already adopted by driver developers. Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Brian Norris <briannorris@chromium.org> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NXie He <xie.he.0141@gmail.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
That is needed to let the subflows announce promptly when new space is available in the receive buffer. tcp_cleanup_rbuf() is currently a static function, drop the scope modifier and add a declaration in the TCP header. Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Update the scheduler to less trivial heuristic: cache the last used subflow, and try to send on it a reasonably long burst of data. When the burst or the subflow send space is exhausted, pick the subflow with the lower ratio between write space and send buffer - that is, the subflow with the greater relative amount of free space. v1 -> v2: - fix 32 bit build breakage due to 64bits div - fix checkpath issues (uint64_t -> u64) Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Currently the 'backup' attribute of local endpoint is ignored. Let's use it for the MP_JOIN handshake Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
So that can be accessed easily from the subflow creation helper. No functional change intended. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Add a bunch of MPTCP mibs related to MPTCP OoO data processing. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
There is no need to use the tcp_read_sock(), we can simply drop the skb. Additionally try to look at the next buffer for in order data. This both simplifies the code and avoid unneeded indirect calls. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Add an RB-tree to cope with OoO (at MPTCP level) data. __mptcp_move_skb() insert into the RB tree "future" data, eventually coalescing skb as allowed by the MPTCP DSN. To simplify sequence accounting, move the DSN inside the cb. After successfully enqueuing in sequence data, check if we can use any data from the RB tree. Additionally move the data_fin check after spooling data from the OoO tree, otherwise we could miss shutdown events. The RB tree code is copied as verbatim as possible from tcp_data_queue_ofo(), with a few simplifications due to the fact that MPTCP doesn't need to cope with sacks. All bugs here are added by me. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Factor-out existing code, will be re-used by the next patch. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Let the msk sendbuf track the size of the larger subflow's send window, so that we ensure mptcp_sendmsg() does not exceed MPTCP-level send window. The update is performed just before try to send any data. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
This is a prerequisite to allow receiving data from multiple subflows without re-injection. Instead of dropping the OoO - "future" data in subflow_check_data_avail(), call into __mptcp_move_skbs() and let the msk drop that. To avoid code duplication factor out the mptcp_subflow_discard_data() helper. Note that __mptcp_move_skbs() can now find multiple subflows with data avail (comprising to-be-discarded data), so must update the byte counter incrementally. v1 -> v2: - fix checkpatch issues (unsigned -> unsigned int) Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
This simplify mptcp_subflow_data_available() and will made follow-up patches simpler. Additionally remove the unneeded checks on subflow copied_seq: we always whole skbs out of subflows. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Currently, when checking for the 'msk is writable' condition, we look at the individual subflows write space. That works well while we send data via a single subflow, but will not as soon as we will enable concurrent xmit on multiple subflows. With this change msk becomes writable when the following conditions hold: - the socket has some free write space - there is at least a subflow with write free space Additionally we need to set the NOSPACE bit on all subflows before blocking. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-