- 06 5月, 2022 28 次提交
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue由 David S. Miller 提交于
Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-05-05 This series contains updates to ice driver only. Wan Jiabing converts an open coded min selection to min_t(). Maciej commonizes on a single find VSI function and removes the duplicated implementation. Wojciech adjusts the return value when exceeding ICE_MAX_CHAIN_WORDS to, a more appropriate, -ENOSPC and allows for the error to be propagated. Michal adds support for ndo_get_devlink_port(). Jake does some cleanup related to virtualization code. Mainly involving function header comments and wording changes. NULL checks are added to ice_get_vf_vsi() calls in order to prevent static analysis tools from complaining that a NULL value could be dereferenced. --- v2: Dropped patch 1: "ice: Add support for classid based queue selection" ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Mat Martineau says: ==================== mptcp: Improve MPTCP-level window tracking This series improves MPTCP receive window compliance with RFC 8684 and helps increase throughput on high-speed links. Note that patch 3 makes a change in tcp_output.c For the details, Paolo says: I've been chasing bad/unstable performance with multiple subflows on very high speed links. It looks like the root cause is due to the current mptcp-level congestion window handling. There are apparently a few different sub-issues: - the rcv_wnd is not effectively shared on the tx side, as each subflow takes in account only the value received by the underlaying TCP connection. This is addressed in patch 1/5 - The mptcp-level offered wnd right edge is currently allowed to shrink. Reading section 3.3.4.: """ The receive window is relative to the DATA_ACK. As in TCP, a receiver MUST NOT shrink the right edge of the receive window (i.e., DATA_ACK + receive window). The receiver will use the data sequence number to tell if a packet should be accepted at the connection level. """ I read the above as we need to reflect window right-edge tracking on the wire, see patch 4/5. - The offered window right edge tracking can happen concurrently on multiple subflows, but there is no mutex protection. We need an additional atomic operation - still patch 4/5 This series additionally bumps a few new MIBs to track all the above (ensure/observe that the suspected races actually take place). I could not access again the host where the issue was so noticeable, still in the current setup the tput changes from [6-18] Gbps to 19Gbps very stable. ==================== Link: https://lore.kernel.org/r/20220504215408.349318-1-mathew.j.martineau@linux.intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Track the exceptional handling of MPTCP-level offered window with a few more counters for observability. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
As per RFC, the offered MPTCP-level window should never shrink. While we currently track the right edge, we don't enforce the above constraint on the wire. Additionally, concurrent xmit on different subflows can end-up in erroneous right edge update. Address the above explicitly updating the announced window and protecting the update with an additional atomic operation (sic) Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
The MPTCP RFC requires that the MPTCP-level receive window's right edge never moves backward. Currently the MPTCP code enforces such constraint while tracking the right edge, but it does not reflects it on the wire, as MPTCP lacks a suitable hook to update accordingly the TCP header. This change modifies the existing mptcp_write_options() hook, providing the current packet's TCP header to the MPTCP protocol, so that the next patch could implement the above mentioned constraint. No functional changes intended. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Bump a counter for counter when snd_wnd is shared among subflow, for observability's sake. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
As per RFC, mptcp subflows use a "shared" snd_wnd: the effective window is the maximum among the current values received on all subflows. Without such feature a data transfer using multiple subflows could block. Window sharing is currently implemented in the RX side: __tcp_select_window uses the mptcp-level receive buffer to compute the announced window. That is not enough: the TCP stack will stick to the window size received on the given subflow; we need to propagate the msk window value on each subflow at xmit time. Change the packet scheduler to ignore the subflow level window and use instead the msk level one Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Andy Shevchenko 提交于
There is export_uuid() function which exports uuid_t to the u8 array. Use it instead of open coding variant. This allows to hide the uuid_t internals. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220504091407.70661-1-andriy.shevchenko@linux.intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 David Ahern 提交于
msg_zerocopy_alloc is only used by msg_zerocopy_realloc; remove the export and make static in skbuff.c Signed-off-by: NDavid Ahern <dsahern@kernel.org> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/r/20220504170947.18773-1-dsahern@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Make the drivers with custom tx napi weight call netif_napi_add_tx_weight(). Reviewed-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20220504163725.550782-2-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Switch net callers to the new API not requiring the NAPI_POLL_WEIGHT argument. Acked-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAlex Elder <elder@linaro.org> Acked-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: NAlexandra Winter <wintera@linux.ibm.com> Link: https://lore.kernel.org/r/20220504163725.550782-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Remove a define which looks like a OS abstraction layer and makes spatch conversions on this driver problematic. Link: https://lore.kernel.org/r/20220504163939.551231-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Christophe Leroy 提交于
powerpc's asm/prom.h includes some headers that it doesn't need itself. In order to clean powerpc's asm/prom.h up in a further step, first clean all files that include asm/prom.h Some files don't need asm/prom.h at all. For those ones, just remove inclusion of asm/prom.h Some files don't need any of the items provided by asm/prom.h, but need some of the headers included by asm/prom.h. For those ones, add the needed headers that are brought by asm/prom.h at the moment and remove asm/prom.h Some files really need asm/prom.h but also need some of the headers included by asm/prom.h. For those one, leave asm/prom.h but also add the needed headers so that they can be removed from asm/prom.h in a later step. Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/09a13d592d628de95d30943e59b2170af5b48110.1651663857.git.christophe.leroy@csgroup.euSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Christophe Leroy 提交于
powerpc's <asm/prom.h> includes some headers that it doesn't need itself. In order to clean powerpc's <asm/prom.h> up in a further step, first clean all files that include <asm/prom.h> sungem_phy.c doesn't use any object provided by <asm/prom.h>. But removing inclusion of <asm/prom.h> leads to the following errors: CC drivers/net/sungem_phy.o drivers/net/sungem_phy.c: In function 'bcm5421_init': drivers/net/sungem_phy.c:448:42: error: implicit declaration of function 'of_get_parent'; did you mean 'dget_parent'? [-Werror=implicit-function-declaration] 448 | struct device_node *np = of_get_parent(phy->platform_data); | ^~~~~~~~~~~~~ | dget_parent drivers/net/sungem_phy.c:448:42: warning: initialization of 'struct device_node *' from 'int' makes pointer from integer without a cast [-Wint-conversion] drivers/net/sungem_phy.c:450:35: error: implicit declaration of function 'of_get_property' [-Werror=implicit-function-declaration] 450 | if (np == NULL || of_get_property(np, "no-autolowpower", NULL)) | ^~~~~~~~~~~~~~~ Remove <asm/prom.h> from included headers but add <linux/of.h> to handle the above. Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/f7a7fab3ec5edf803d934fca04df22631c2b449d.1651662885.git.christophe.leroy@csgroup.euSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Eyal Birger 提交于
The commit referenced in the "Fixes" tag added the SO_RCVMARK socket option for receiving the skb mark in the ancillary data. Since this is a new capability, and exposes admin configured details regarding the underlying network setup to sockets, let's align the needed capabilities with those of SO_MARK. Fixes: 6fd1d51c ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()") Signed-off-by: NEyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20220504095459.2663513-1-eyal.birger@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
This reverts commit 5e927a9f, reversing changes made to cfc1d91a. The discussion is still ongoing so let's remove the uAPI until the discussion settles. Link: https://lore.kernel.org/all/20220425090021.32e9a98f@kernel.org/Reviewed-by: NIdo Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220504154037.539442-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net由 Jakub Kicinski 提交于
tools/testing/selftests/net/forwarding/Makefile f62c5acc ("selftests/net/forwarding: add missing tests to Makefile") 50fe062c ("selftests: forwarding: new test, verify host mdb entries") https://lore.kernel.org/all/20220502111539.0b7e4621@canb.auug.org.au/Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jacob Keller 提交于
The ice_for_each_vf macros have comments describing the implementation. One of the arguments has a period on the end, which is not our typical style. Remove the unnecessary period. Signed-off-by: NJacob Keller <jacob.e.keller@intel.com> Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
-
由 Jacob Keller 提交于
This function definition was missing a comment describing its implementation. Add one. Signed-off-by: NJacob Keller <jacob.e.keller@intel.com> Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
-
由 Jacob Keller 提交于
The comment explaining ice_reset_vf has an extraneous "the" with the "if the resets are disabled". Remove it. Signed-off-by: NJacob Keller <jacob.e.keller@intel.com> Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
-
由 Jacob Keller 提交于
Since commit fe99d1c0 ("ice: make ice_reset_all_vfs void"), the ice_reset_all_vfs function has not returned anything. The function comment still indicated it did. Fix this. While here, also add a line to clarify the function resets all VFs at once in response to hardware resets such as a PF reset. Signed-off-by: NJacob Keller <jacob.e.keller@intel.com> Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
-
由 Jacob Keller 提交于
The ice_get_vf_vsi function can return NULL in some cases, such as if handling messages during a reset where the VSI is being removed and recreated. Several places throughout the driver do not bother to check whether this VSI pointer is valid. Static analysis tools maybe report issues because they detect paths where a potentially NULL pointer could be dereferenced. Fix this by checking the return value of ice_get_vf_vsi everywhere. Signed-off-by: NJacob Keller <jacob.e.keller@intel.com> Reviewed-by: NPaul Menzel <pmenzel@molgen.mpg.de> Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
-
由 Jacob Keller 提交于
The debug print in ice_vf_fdir_dump_info does not end in newlines. This can look confusing when reading the kernel log, as the next print will immediately continue on the same line. Fix this by adding the forgotten newline. Signed-off-by: NJacob Keller <jacob.e.keller@intel.com> Reviewed-by: NPaul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
-
由 Michal Swiatkowski 提交于
Switch id should be the same for each netdevice on a driver. The id must be unique between devices on the same system, but does not need to be unique between devices on different systems. The switch id is used to locate ports on a switch and to know if aggregated ports belong to the same switch. To meet this requirements, use pci_get_dsn as switch id value, as this is unique value for each devices on the same system. Implementing switch id is needed by automatic tools for kubernetes. Set switch id by setting devlink port attribiutes and calling devlink_port_attrs_set while creating pf (for uplink) and vf (for representator) devlink port. To get switch id (in switchdev mode): cat /sys/class/net/$PF0/phys_switch_id Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: NMarcin Szycik <marcin.szycik@linux.intel.com> Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
-
由 Wojciech Drewek 提交于
When number of words exceeds ICE_MAX_CHAIN_WORDS, -ENOSPC should be returned not -EINVAL. Do not overwrite this error code in ice_add_tc_flower_adv_fltr. Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com> Suggested-by: NMarcin Szycik <marcin.szycik@linux.intel.com> Acked-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
-
由 Maciej Fijalkowski 提交于
Both ice_idc.c and ice_virtchnl.c carry their own implementation of a helper function that is looking for a given VSI based on provided vsi_num. Their functionality is the same, so let's introduce the common function in ice.h that both of the mentioned sites will use. This is a strictly cleanup thing, no functionality is changed. Reviewed-by: NAlexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
-
由 Wan Jiabing 提交于
Fix the following coccicheck warning: ./drivers/net/ethernet/intel/ice/ice_gnss.c:79:26-27: WARNING opportunity for min() Signed-off-by: NWan Jiabing <wanjiabing@vivo.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net由 Linus Torvalds 提交于
Pull networking fixes from Paolo Abeni: "Including fixes from can, rxrpc and wireguard. Previous releases - regressions: - igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() - mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter() - rds: acquire netns refcount on TCP sockets - rxrpc: enable IPv6 checksums on transport socket - nic: hinic: fix bug of wq out of bound access - nic: thunder: don't use pci_irq_vector() in atomic context - nic: bnxt_en: fix possible bnxt_open() failure caused by wrong RFS flag - nic: mlx5e: - lag, fix use-after-free in fib event handler - fix deadlock in sync reset flow Previous releases - always broken: - tcp: fix insufficient TCP source port randomness - can: grcan: grcan_close(): fix deadlock - nfc: reorder destructive operations in to avoid bugs Misc: - wireguard: improve selftests reliability" * tag 'net-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits) NFC: netlink: fix sleep in atomic bug when firmware download timeout selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer tcp: drop the hash_32() part from the index calculation tcp: increase source port perturb table to 2^16 tcp: dynamically allocate the perturb table used by source ports tcp: add small random increments to the source port tcp: resalt the secret every 10 seconds tcp: use different parts of the port_offset for index and offset secure_seq: use the 64 bits of the siphash for port offset calculation wireguard: selftests: set panic_on_warn=1 from cmdline wireguard: selftests: bump package deps wireguard: selftests: restore support for ccache wireguard: selftests: use newer toolchains to fill out architectures wireguard: selftests: limit parallelism to $(nproc) tests at once wireguard: selftests: make routing loop test non-fatal net/mlx5: Fix matching on inner TTC net/mlx5: Avoid double clear or set of sync reset requested net/mlx5: Fix deadlock in sync reset flow net/mlx5e: Fix trust state reset in reload net/mlx5e: Avoid checking offload capability in post_parse action ...
-
- 05 5月, 2022 12 次提交
-
-
由 Casper Andersson 提交于
Handle adding and removing MDB entries for host Signed-off-by: NCasper Andersson <casper.casan@gmail.com> Link: https://lore.kernel.org/r/20220503093922.1630804-1-casper.casan@gmail.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
-
由 Duoming Zhou 提交于
There are sleep in atomic bug that could cause kernel panic during firmware download process. The root cause is that nlmsg_new with GFP_KERNEL parameter is called in fw_dnld_timeout which is a timer handler. The call trace is shown below: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265 Call Trace: kmem_cache_alloc_node __alloc_skb nfc_genl_fw_download_done call_timer_fn __run_timers.part.0 run_timer_softirq __do_softirq ... The nlmsg_new with GFP_KERNEL parameter may sleep during memory allocation process, and the timer handler is run as the result of a "software interrupt" that should not call any other function that could sleep. This patch changes allocation mode of netlink message from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic bug. The GFP_ATOMIC flag makes memory allocation operation could be used in atomic context. Fixes: 9674da87 ("NFC: Add firmware upload netlink command") Fixes: 9ea7187c ("NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD") Signed-off-by: NDuoming Zhou <duoming@zju.edu.cn> Reviewed-by: NKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220504055847.38026-1-duoming@zju.edu.cnSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
-
由 Jakub Kicinski 提交于
Vladimir Oltean says: ==================== Ocelot VCAP cleanups This is a series of minor code cleanups brought to the Ocelot switch driver logic for VCAP filters. - don't use list_for_each_safe() in ocelot_vcap_filter_add_to_block - don't use magic numbers for OCELOT_POLICER_DISCARD ==================== Link: https://lore.kernel.org/r/20220503120150.837233-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
OCELOT_POLICER_DISCARD helps "kill dropped packets dead" since a PERMIT/DENY mask mode with a port mask of 0 isn't enough to stop the CPU port from receiving packets removed from the forwarding path. The hardcoded initialization done for it in ocelot_vcap_init() is confusing. All we need from it is to have a rate and a burst size of 0. Reuse qos_policer_conf_set() for that purpose. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
The "port" argument is used for nothing else except printing on the error path. Print errors on behalf of the policer index, which is less confusing anyway. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
Unify the code paths for adding to an empty list and to a list with elements by keeping a "pos" list_head element that indicates where to insert. Initialize "pos" with the list head itself in case list_for_each_entry() doesn't iterate over any element. Note that list_for_each_safe() isn't needed because no element is removed from the list while iterating. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
This makes no functional difference but helps in minimizing the delta for a future change. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
list_add(..., pos->prev) and list_add_tail(..., pos) are equivalent, use the later form to unify with the case where the list is empty later. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Michael Walle 提交于
In commit 4fdabd50 ("dt-bindings: net: lan966x: remove PHY reset") the PHY reset was removed, but I failed to remove it from the example. Fix it. Fixes: 4fdabd50 ("dt-bindings: net: lan966x: remove PHY reset") Reported-by: NRob Herring <robh@kernel.org> Signed-off-by: NMichael Walle <michael@walle.cc> Acked-by: NRob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220503132038.2714128-1-michael@walle.ccSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
As discussed here with Ido Schimmel: https://patchwork.kernel.org/project/netdevbpf/patch/20220224102908.5255-2-jianbol@nvidia.com/ the default conform-exceed action is "reclassify", for a reason we don't really understand. The point is that hardware can't offload that police action, so not specifying "conform-exceed" was always wrong, even though the command used to work in hardware (but not in software) until the kernel started adding validation for it. Fix the command used by the selftest by making the policer drop on exceed, and pass the packet to the next action (goto) on conform. Fixes: 8cd6b020 ("selftests: ocelot: add some example VCAP IS1, IS2 and ES0 tc offloads") Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NIdo Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220503121428.842906-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Willy Tarreau says: ==================== insufficient TCP source port randomness In a not-yet published paper, Moshe Kol, Amit Klein, and Yossi Gilad report being able to accurately identify a client by forcing it to emit only 40 times more connections than the number of entries in the table_perturb[] table, which is indexed by hashing the connection tuple. The current 2^8 setting allows them to perform that attack with only 10k connections, which is not hard to achieve in a few seconds. Eric, Amit and I have been working on this for a few weeks now imagining, testing and eliminating a number of approaches that Amit and his team were still able to break or that were found to be too risky or too expensive, and ended up with the simple improvements in this series that resists to the attack, doesn't degrade the performance, and preserves a reliable port selection algorithm to avoid connection failures, including the odd/even port selection preference that allows bind() to always find a port quickly even under strong connect() stress. The approach relies on several factors: - resalting the hash secret that's used to choose the table_perturb[] entry every 10 seconds to eliminate slow attacks and force the attacker to forget everything that was learned after this delay. This already eliminates most of the problem because if a client stays silent for more than 10 seconds there's no link between the previous and the next patterns, and 10s isn't yet frequent enough to cause too frequent repetition of a same port that may induce a connection failure ; - adding small random increments to the source port. Previously, a random 0 or 1 was added every 16 ports. Now a random 0 to 7 is added after each port. This means that with the default 32768-60999 range, a worst case rollover happens after 1764 connections, and an average of 3137. This doesn't stop statistical attacks but requires significantly more iterations of the same attack to confirm a guess. - increasing the table_perturb[] size from 2^8 to 2^16, which Amit says will require 2.6 million connections to be attacked with the changes above, making it pointless to get a fingerprint that will only last 10 seconds. Due to the size, the table was made dynamic. - a few minor improvements on the bits used from the hash, to eliminate some unfortunate correlations that may possibly have been exploited to design future attack models. These changes were tested under the most extreme conditions, up to 1.1 million connections per second to one and a few targets, showing no performance regression, and only 2 connection failures within 13 billion, which is less than 2^-32 and perfectly within usual values. The series is split into small reviewable changes and was already reviewed by Amit and Eric. ==================== Link: https://lore.kernel.org/r/20220502084614.24123-1-w@1wt.euSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Willy Tarreau 提交于
In commit 190cc824 ("tcp: change source port randomizarion at connect() time"), the table_perturb[] array was introduced and an index was taken from the port_offset via hash_32(). But it turns out that hash_32() performs a multiplication while the input here comes from the output of SipHash in secure_seq, that is well distributed enough to avoid the need for yet another hash. Suggested-by: NAmit Klein <aksecurity@gmail.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NWilly Tarreau <w@1wt.eu> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-