- 16 10月, 2018 7 次提交
-
-
由 Eric Dumazet 提交于
sk_pacing_rate has beed introduced as a u32 field in 2013, effectively limiting per flow pacing to 34Gbit. We believe it is time to allow TCP to pace high speed flows on 64bit hosts, as we now can reach 100Gbit on one TCP flow. This patch adds no cost for 32bit kernels. The tcpi_pacing_rate and tcpi_max_pacing_rate were already exported as 64bit, so iproute2/ss command require no changes. Unfortunately the SO_MAX_PACING_RATE socket option will stay 32bit and we will need to add a new option to let applications control high pacing rates. State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 1787144 10.246.9.76:49992 10.246.9.77:36741 timer:(on,003ms,0) ino:91863 sk:2 <-> skmem:(r0,rb540000,t66440,tb2363904,f605944,w1822984,o0,bl0,d0) ts sack bbr wscale:8,8 rto:201 rtt:0.057/0.006 mss:1448 rcvmss:536 advmss:1448 cwnd:138 ssthresh:178 bytes_acked:256699822585 segs_out:177279177 segs_in:3916318 data_segs_out:177279175 bbr:(bw:31276.8Mbps,mrtt:0,pacing_gain:1.25,cwnd_gain:2) send 28045.5Mbps lastrcv:73333 pacing_rate 38705.0Mbps delivery_rate 22997.6Mbps busy:73333ms unacked:135 retrans:0/157 rcv_space:14480 notsent:2085120 minrtt:0.013 Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
In EDT design, I made the mistake of using tcp_wstamp_ns to store the last tcp_clock_ns() sample and to store the pacing virtual timer. This causes major regressions at high speed flows. Introduce tcp_clock_cache to store last tcp_clock_ns(). This is needed because some arches have slow high-resolution kernel time service. tcp_wstamp_ns is only updated when a packet is sent. Note that we can remove tcp_mstamp in the future since tcp_mstamp is essentially tcp_clock_cache/1000, so the apparent socket size increase is temporary. Fixes: 9799ccb0 ("tcp: add tcp_wstamp_ns socket field") Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Li RongQing 提交于
After per-port vlan stats, vlan stats should be released when fail to add vlan Fixes: 9163a0fc ("net: bridge: add support for per-port vlan stats") CC: bridge@lists.linux-foundation.org cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NZhang Yu <zhangyu31@baidu.com> Signed-off-by: NLi RongQing <lirongqing@baidu.com> Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Howells 提交于
Add /proc/net/rxrpc/peers to display the list of peers currently active. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Justin.Lee1@Dell.com 提交于
The new command (NCSI_CMD_SEND_CMD) is added to allow user space application to send NC-SI command to the network card. Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response. The work flow is as below. Request: User space application -> Netlink interface (msg) -> new Netlink handler - ncsi_send_cmd_nl() -> ncsi_xmit_cmd() Response: Response received - ncsi_rcv_rsp() -> internal response handler - ncsi_rsp_handler_xxx() -> ncsi_rsp_handler_netlink() -> ncsi_send_netlink_rsp () -> Netlink interface (msg) -> user space application Command timeout - ncsi_request_timeout() -> ncsi_send_netlink_timeout () -> Netlink interface (msg with zero data length) -> user space application Error: Error detected -> ncsi_send_netlink_err () -> Netlink interface (err msg) -> user space application Signed-off-by: NJustin Lee <justin.lee1@dell.com> Reviewed-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hoang Le 提交于
INADDR_ANY is hard-coded when activating UDP bearer. So, we could not bind to a specific IP address even with replicast mode using - given remote ip address instead of using multicast ip address. In this commit, we fixed it by checking and switch to use appropriate local ip address. before: $netstat -plu Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address udp 0 0 **0.0.0.0:6118** 0.0.0.0:* after: $netstat -plu Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address udp 0 0 **10.0.0.2:6118** 0.0.0.0:* Acked-by: NYing Xue <ying.xue@windriver.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Maciej W. Rozycki 提交于
DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate SMT frames with adapter's firmware. Any SMT frame received from the RMC via the Rx queue is queued back by the driver to the SMT Rx queue for the firmware to process. Similarly the firmware uses the SMT Tx queue to supply the driver with SMT frames which are queued back to the Tx queue for the RMC to send to the ring. When a network tap is attached to an FDDI interface handled by `defza' any incoming SMT frames captured are queued to our usual processing of network data received, which in turn delivers them to any listening taps. However the outgoing SMT frames produced by the firmware bypass our network protocol stack and are therefore not delivered to taps. This in turn means that taps are missing a part of network traffic sent by the adapter, which may make it more difficult to track down network problems or do general traffic analysis. Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that a network tap is attached, with a newly-created `dev_nit_active' helper wrapping the usual condition used in the transmit path. Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 10月, 2018 3 次提交
-
-
由 Nikolay Aleksandrov 提交于
This patch adds an option to have per-port vlan stats instead of the default global stats. The option can be set only when there are no port vlans in the bridge since we need to allocate the stats if it is set when vlans are being added to ports (and respectively free them when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the largest user of options. The current stats design allows us to add these without any changes to the fast-path, it all comes down to the per-vlan stats pointer which, if this option is enabled, will be allocated for each port vlan instead of using the global bridge-wide one. CC: bridge@lists.linux-foundation.org CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
When a link's carrier goes down it could be a sign of the port changing networks. If the new network has overlapping addresses with the old one, then the kernel will continue trying to use neighbor entries established based on the old network until the entries finally age out - meaning a potentially long delay with communications not working. This patch evicts neighbor entries on carrier down with the exception of those marked permanent. Permanent entries are managed by userspace (either an admin or a routing daemon such as FRR). Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE notifications when a device is taken down (admin down) or deleted. IPv4 does not generate a message for routes evicted by the down or delete; IPv6 does. A NOS at scale really needs to avoid these messages and have IPv4 and IPv6 behave similarly, relying on userspace to handle link notifications and evict the routes. At this point existing user behavior needs to be preserved. Since notifications are a global action (not per app) the only way to preserve existing behavior and allow the messages to be skipped is to add a new sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to disable the notificatioons. IPv6 route code already supports the option to skip the message (it is used for multipath routes for example). Besides the new sysctl we need to pass the skip_notify setting through the generic fib6_clean and fib6_walk functions to fib6_clean_node and to set skip_notify on calls to __ip_del_rt for the addrconf_ifdown path. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 10月, 2018 4 次提交
-
-
由 Anilkumar Kolli 提交于
Current mac80211 has provision to update tx status through ieee80211_tx_status() and ieee80211_tx_status_ext(). But drivers like ath10k updates the tx status from the skb except txrate, txrate will be updated from a different path, peer stats. Using ieee80211_tx_status_ext() in two different paths (one for the stats, one for the tx rate) would duplicate the stats instead. To avoid this stats duplication, ieee80211_tx_rate_update() is implemented. Signed-off-by: NAnilkumar Kolli <akolli@codeaurora.org> [minor commit message editing, use initializers in code] Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Ankita Bajaj 提交于
Add support for drivers to report the total number of MPDUs received and the number of MPDUs received with an FCS error from a specific peer. These counters will be incremented only when the TA of the frame matches the MAC address of the peer irrespective of FCS error. It should be noted that the TA field in the frame might be corrupted when there is an FCS error and TA matching logic would fail in such cases. Hence, FCS error counter might not be fully accurate, but it can provide help in detecting bad RX links in significant number of cases. This FCS error counter without full accuracy can be used, e.g., to trigger a kick-out of a connected client with a bad link in AP mode to force such a client to roam to another AP. Signed-off-by: NAnkita Bajaj <bankita@codeaurora.org> Signed-off-by: NJouni Malinen <jouni@codeaurora.org> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Pradeep Kumar Chitrapu 提交于
New bss param ftm_responder is used to notify the driver to enable fine timing request (FTM) responder role in AP mode. Plumb the new cfg80211 API for FTM responder statistics through to the driver API in mac80211. Signed-off-by: NDavid Spinadel <david.spinadel@intel.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NPradeep Kumar Chitrapu <pradeepc@codeaurora.org> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Ying Xue 提交于
When booting kernel with LOCKDEP option, below warning info was found: WARNING: possible recursive locking detected 4.19.0-rc7+ #14 Not tainted -------------------------------------------- swapper/0/1 is trying to acquire lock: 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 but task is already holding lock: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&list->lock)->rlock#4); lock(&(&list->lock)->rlock#4); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by swapper/0/1: #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at: register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051 #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1af/0x295 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1759 [inline] check_deadlock kernel/locking/lockdep.c:1803 [inline] validate_chain kernel/locking/lockdep.c:2399 [inline] __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411 lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:334 [inline] tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526 tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521 tipc_init_net+0x472/0x610 net/tipc/core.c:82 ops_init+0xf7/0x520 net/core/net_namespace.c:129 __register_pernet_operations net/core/net_namespace.c:940 [inline] register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011 register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052 tipc_init+0x83/0x104 net/tipc/core.c:140 do_one_initcall+0x109/0x70a init/main.c:885 do_initcall_level init/main.c:953 [inline] do_initcalls init/main.c:961 [inline] do_basic_setup init/main.c:979 [inline] kernel_init_freeable+0x4bd/0x57f init/main.c:1144 kernel_init+0x13/0x180 init/main.c:1063 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413 The reason why the noise above was complained by LOCKDEP is because we nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset function. In fact it's unnecessary to move skb buffer from l->wakeupq queue to l->inputq queue while holding the two locks at the same time. Instead, we can move skb buffers in l->wakeupq queue to a temporary list first and then move the buffers of the temporary list to l->inputq queue, which is also safe for us. Fixes: 3f32d0be ("tipc: lock wakeup & inputq at tipc_link_reset()") Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NYing Xue <ying.xue@windriver.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 10月, 2018 26 次提交
-
-
由 Jouni Malinen 提交于
Previous implementation of SAE authentication in infrastructure BSS was somewhat restricting and not exactly clean way of handling the two auth() operations. This ended up removing and re-adding the STA entry for the AP in the middle of authentication and also messing up authentication state tracking through the sequence of four Authentication frames. Furthermore, this did not work if the AP ended up sending out SAE Confirm (auth trans #2) immediately after SAE Commit (auth trans #1) before the station had time to transmit its SAE Confirm. Clean up authentication state handling for the SAE case to allow two rounds of auth() calls without dropping all state between those operations. Track peer Confirmed status and mark authentication completed only once both ends have confirmed. ieee80211_mgd_auth() check for EBUSY cases is now handling only the pending association (ifmgd->assoc_data) while all pending authentication (ifmgd->auth_data) cases are allowed to proceed to allow user space to start a new connection attempt from scratch even if the previously requested authentication is still waiting completion. This is needed to avoid making SAE error cases with retries take excessive amount of time with no means for the user space to stop that (apart from setting the netdev down). As an extra bonus, the end of ieee80211_rx_mgmt_auth() can be cleaned up to avoid the extra copy of the cfg80211_rx_mlme_mgmt() call for ongoing SAE authentication since the new ieee80211_mark_sta_auth() helper function can handle both completion of authentication and updates to the STA entry under the same condition and there is no need to return from the function between those operations. Signed-off-by: NJouni Malinen <jouni@codeaurora.org> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Jouni Malinen 提交于
This makes it easier to conditionally replace full allocation of auth_data to use reallocation for the case of continuing SAE authentication. Furthermore, there was not really any point in having this check done so late in the function after having already completed number of steps that cannot be used anyway in the error case. Signed-off-by: NJouni Malinen <jouni@codeaurora.org> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Jouni Malinen 提交于
Authentication exchange can be completed in both TX and RX paths for SAE, so move this common functionality into a helper function to avoid having to implement practically the same operations in two places when extending SAE implementation in the following commits. Signed-off-by: NJouni Malinen <jouni@codeaurora.org> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Felix Fietkau 提交于
When there are few packets (e.g. for sampling attempts), the exponentially weighted variance is usually vastly overestimated, making the resulting data essentially useless. As far as I know, there has not been any practical use for this, so let's not waste any cycles on it. Signed-off-by: NFelix Fietkau <nbd@nbd.name> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Felix Fietkau 提交于
These rates are highly unlikely to be used quickly, even if the link deteriorates rapidly. This improves throughput in cases where CCK rates are not reliable enough to be skipped entirely during sampling. Sampling these rates regularly can cost a lot of airtime. Signed-off-by: NFelix Fietkau <nbd@nbd.name> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Felix Fietkau 提交于
Long/short preamble selection cannot be sampled separately, since it depends on the BSS state. Because of that, sampling attempts to currently not used preamble modes are not counted in the statistics, which leads to CCK rates being sampled too often. Fix statistics accounting for long/short preamble by increasing the index where necessary. Fix excessive CCK rate sampling by dropping unsupported sample attempts. This improves throughput on 2.4 GHz channels Signed-off-by: NFelix Fietkau <nbd@nbd.name> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Felix Fietkau 提交于
Fixes a harmless underflow issue when CCK rates are actively being used Signed-off-by: NFelix Fietkau <nbd@nbd.name> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Felix Fietkau 提交于
mi->supported[MINSTREL_CCK_GROUP] needs to be updated short preamble rates need to be marked as supported regardless of whether it's currently enabled. Its state can change at any time without a rate_update call. Fixes: 782dda00 ("mac80211: minstrel_ht: move short preamble check out of get_rate") Signed-off-by: NFelix Fietkau <nbd@nbd.name> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Felix Fietkau 提交于
By storing a shift value for all duration values of a group, we can reduce precision by a neglegible amount to make it fit into a u16 value. This improves cache footprint and reduces size: Before: text data bss dec hex filename 10024 116 0 10140 279c rc80211_minstrel_ht.o After: text data bss dec hex filename 9368 116 0 9484 250c rc80211_minstrel_ht.o Signed-off-by: NFelix Fietkau <nbd@nbd.name> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Felix Fietkau 提交于
Legacy-only devices are not very common and the overhead of the extra code for HT and VHT rates is not big enough to justify all those extra lines of code to make it optional. Signed-off-by: NFelix Fietkau <nbd@nbd.name> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Felix Fietkau 提交于
debugfs entries are cleaned up by debugfs_remove_recursive already. Signed-off-by: NFelix Fietkau <nbd@nbd.name> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Chaitanya T K 提交于
If peer support reception of STBC and LDPC, enable them for better performance. Signed-off-by: NChaitanya TK <chaitanya.mgit@gmail.com> Signed-off-by: NFelix Fietkau <nbd@nbd.name> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Johannes Berg 提交于
I'm not really sure exactly _why_ I've been carrying a note for what's probably _years_ to check that we don't do this, but we clearly do reflect frames back to the station itself if it sends such. One way or the other, it's useless since the station doesn't really need the AP to talk to itself, so suppress it. While at it, clarify some of the logic by removing skb->data references in favour of the destination address (pointer) we already have separately. Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Johannes Berg 提交于
Instead of open-coding a lot of calls to is_valid_ie_attr(), add this validation directly to the policy, now that we can. Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Johannes Berg 提交于
Many range checks can be done in the policy, move them there. A few in mesh are added in the code (taken out of the macros) because they don't fit into the s16 range in the policy validation. Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Parthasarathy Bhuvaragan 提交于
In tipc_sk_filter_rcv(), when we detect protocol messages with error we call tipc_sk_conn_proto_rcv() and let it reset the connection and notify the socket by calling sk->sk_state_change(). However, tipc_sk_filter_rcv() may have been called from the function tipc_backlog_rcv(), in which case the socket lock is held and the socket already awake. This means that the sk_state_change() call is ignored and the error notification lost. Now the receive queue will remain empty and the socket sleeps forever. In this commit, we convert the protocol message into a connection abort message and enqueue it into the socket's receive queue. By this addition to the above state change we cover all conditions. Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Maloy 提交于
In the patch referred to below we added link tolerance as an additional criteria for declaring broadcast transmission "stale" and resetting the affected links. However, the 'tolerance' field of the broadcast link is never set, and remains at zero. This renders the whole commit without the intended improving effect, but luckily also with no negative effect. In this commit we add the missing initialization. Fixes: a4dc70d4 ("tipc: extend link reset criteria for stale packet retransmission") Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
While noop_qdisc.gso_skb and noop_qdisc.skb_bad_txq are not used in other places, it seems not correct to overwrite their fields in dev_init_scheduler_queue(). noop_qdisc is essentially a shared and read-only object, even if it is not marked as const because of some implementation detail. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Without CONFIG_INET enabled compiles fail with: net/mpls/af_mpls.o: In function `mpls_dump_routes': af_mpls.c:(.text+0xed0): undefined reference to `ip_valid_fib_dump_req' The preference is for MPLS to use the same handler as ipv4 and ipv6 to allow consistency when doing a dump for AF_UNSPEC which walks all address families invoking the route dump handler. If INET is disabled then fallback to an MPLS version which can be tighter on the data checks. Fixes: e8ba330a ("rtnetlink: Update fib dumps for strict data checking") Reported-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
When an MTU update with PMTU smaller than net.ipv4.route.min_pmtu is received, we must clamp its value. However, we can receive a PMTU exception with PMTU < old_mtu < ip_rt_min_pmtu, which would lead to an increase in PMTU. To fix this, take the smallest of the old MTU and ip_rt_min_pmtu. Before this patch, in case of an update, the exception's MTU would always change. Now, an exception can have only its lock flag updated, but not the MTU, so we need to add a check on locking to the following "is this exception getting updated, or close to expiring?" test. Fixes: d52e5a7e ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Reviewed-by: NStefano Brivio <sbrivio@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
Since commit 5aad1de5 ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5 ("ipv4: use separate genid for next hop exceptions") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Reviewed-by: NStefano Brivio <sbrivio@redhat.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mike Rapoport 提交于
The fib6_info_alloc() function allocates percpu memory to hold per CPU pointers to rt6_info, but this memory is never freed. Fix it. Fixes: a64efe14 ("net/ipv6: introduce fib6_info struct and helpers") Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
DCTCP has two parts - a new ECN signalling mechanism and the response function to it. The first part can be used by other congestion control for DCTCP-ECN deployed networks. This patch moves that part into a separate tcp_dctcp.h to be used by other congestion control module (like how Yeah uses Vegas algorithmas). For example, BBR is experimenting such ECN signal currently https://tinyurl.com/ietf-102-iccrg-bbr2Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NYousuk Seung <ysseung@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
ipv6_route_table_template is exported but there are no users outside of route.c. Make it static. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
The NLM_F_DUMP_PROPER_HDR netlink flag was replaced by a setsockopt. Update the comment in rtnl_stats_dump. Fixes: 841891ec ("rtnetlink: Update rtnl_stats_dump for strict data checking") Reported-by: NChristian Brauner <christian@brauner.io> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Move setting of local variable ifm to after the message parsing in valid_fdb_dump_legacy. Avoid potential future use of unchecked variable. Fixes: 8dfbda19 ("rtnetlink: Move input checking for rtnl_fdb_dump to helper") Reported-by: NChristian Brauner <christian@brauner.io> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-