- 27 10月, 2020 1 次提交
-
-
由 Jeff Vander Stoep 提交于
During __vsock_create() CAP_NET_ADMIN is used to determine if the vsock_sock->trusted should be set to true. This value is used later for determing if a remote connection should be allowed to connect to a restricted VM. Unfortunately, if the caller doesn't have CAP_NET_ADMIN, an audit message such as an selinux denial is generated even if the caller does not want a trusted socket. Logging errors on success is confusing. To avoid this, switch the capable(CAP_NET_ADMIN) check to the noaudit version. Reported-by: NRoman Kiryanov <rkir@google.com> https://android-review.googlesource.com/c/device/generic/goldfish/+/1468545/Signed-off-by: NJeff Vander Stoep <jeffv@google.com> Reviewed-by: NJames Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/20201023143757.377574-1-jeffv@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 24 10月, 2020 1 次提交
-
-
由 Arjun Roy 提交于
With SO_RCVLOWAT, under memory pressure, it is possible to enter a state where: 1. We have not received enough bytes to satisfy SO_RCVLOWAT. 2. We have not entered buffer pressure (see tcp_rmem_pressure()). 3. But, we do not have enough buffer space to accept more packets. In this case, we advertise 0 rwnd (due to #3) but the application does not drain the receive queue (no wakeup because of #1 and #2) so the flow stalls. Modify the heuristic for SO_RCVLOWAT so that, if we are advertising rwnd<=rcv_mss, force a wakeup to prevent a stall. Without this patch, setting tcp_rmem to 6143 and disabling TCP autotune causes a stalled flow. With this patch, no stall occurs. This is with RPC-style traffic with large messages. Fixes: 03f45c88 ("tcp: avoid extra wakeups for SO_RCVLOWAT users") Signed-off-by: NArjun Roy <arjunroy@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201023184709.217614-1-arjunroy.kdev@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 23 10月, 2020 2 次提交
-
-
由 Neal Cardwell 提交于
In the header prediction fast path for a bulk data receiver, if no data is newly acknowledged then we do not call tcp_ack() and do not call tcp_ack_update_window(). This means that a bulk receiver that receives large amounts of data can have the incoming sequence numbers wrap, so that the check in tcp_may_update_window fails: after(ack_seq, tp->snd_wl1) If the incoming receive windows are zero in this state, and then the connection that was a bulk data receiver later wants to send data, that connection can find itself persistently rejecting the window updates in incoming ACKs. This means the connection can persistently fail to discover that the receive window has opened, which in turn means that the connection is unable to send anything, and the connection's sending process can get permanently "stuck". The fix is to update snd_wl1 in the header prediction fast path for a bulk data receiver, so that it keeps up and does not see wrapping problems. This fix is based on a very nice and thorough analysis and diagnosis by Apollon Oikonomopoulos (see link below). This is a stable candidate but there is no Fixes tag here since the bug predates current git history. Just for fun: looks like the bug dates back to when header prediction was added in Linux v2.1.8 in Nov 1996. In that version tcp_rcv_established() was added, and the code only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer: receiver" code path it does not call tcp_ack(). This fix seems to apply cleanly at least as far back as v3.2. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NNeal Cardwell <ncardwell@google.com> Reported-by: NApollon Oikonomopoulos <apoikos@dmesg.gr> Tested-by: NApollon Oikonomopoulos <apoikos@dmesg.gr> Link: https://www.spinics.net/lists/netdev/msg692430.htmlAcked-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201022143331.1887495-1-ncardwell.kernel@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Ke Li 提交于
In setsockopt(SO_MAX_PACING_RATE) on 64bit systems, sk_max_pacing_rate, after extended from 'u32' to 'unsigned long', takes unintentionally hiked value whenever assigned from an 'int' value with MSB=1, due to binary sign extension in promoting s32 to u64, e.g. 0x80000000 becomes 0xFFFFFFFF80000000. Thus inflated sk_max_pacing_rate causes subsequent getsockopt to return ~0U unexpectedly. It may also result in increased pacing rate. Fix by explicitly casting the 'int' value to 'unsigned int' before assigning it to sk_max_pacing_rate, for zero extension to happen. Fixes: 76a9ebe8 ("net: extend sk_pacing_rate to unsigned long") Signed-off-by: NJi Li <jli@akamai.com> Signed-off-by: NKe Li <keli@akamai.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201022064146.79873-1-keli@akamai.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 22 10月, 2020 3 次提交
-
-
由 Pablo Neira Ayuso 提交于
Similar to 7980d2ea ("ipvs: clear skb->tstamp in forwarding path"). fq qdisc requires tstamp to be cleared in forwarding path. Fixes: 8203e2d8 ("net: clear skb->tstamp in forwarding paths") Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC") Fixes: 80b14dee ("net: Add a new socket option for a future transmit time.") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Di Zhu 提交于
"ip addr show" command execute error when we have a physical network card with a large number of VFs The return value of if_nlmsg_size() in rtnl_calcit() will exceed range of u16 data type when any network cards has a larger number of VFs. rtnl_vfinfo_size() will significant increase needed dump size when the value of num_vfs is larger. Eventually we get a wrong value of min_ifinfo_dump_size because of overflow which decides the memory size needed by netlink dump and netlink_dump() will return -EMSGSIZE because of not enough memory was allocated. So fix it by promoting min_dump_alloc data type to u32 to avoid whole netlink message size overflow and it's also align with the data type of struct netlink_callback{}.min_dump_alloc which is assigned by return value of rtnl_calcit() Signed-off-by: NDi Zhu <zhudi21@huawei.com> Link: https://lore.kernel.org/r/20201021020053.1401-1-zhudi21@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Toke Høiland-Jørgensen 提交于
Based on the discussion in [0], update the bpf_redirect_neigh() helper to accept an optional parameter specifying the nexthop information. This makes it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without incurring a duplicate FIB lookup - since the FIB lookup helper will return the nexthop information even if no neighbour is present, this can simply be passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns BPF_FIB_LKUP_RET_NO_NEIGH. Thus fix & extend it before helper API is frozen. [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/160322915615.32199.1187570224032024535.stgit@toke.dk
-
- 21 10月, 2020 12 次提交
-
-
由 Matthieu Baerts 提交于
Like TCP, MPTCP cannot be compiled as a module. Obviously, MPTCP IPv6' support also depends on CONFIG_IPV6. But not all functions from IPv6 code are exported. To simplify the code and reduce modifications outside MPTCP, it was decided from the beginning to support MPTCP with IPv6 only if CONFIG_IPV6 was built inlined. That's also why CONFIG_MPTCP_IPV6 was created. More modifications are needed to support CONFIG_IPV6=m. Even if it was not explicit, until recently, we were forcing CONFIG_IPV6 to be built-in because we had "select IPV6" in Kconfig. Now that we have "depends on IPV6", we have to explicitly set "IPV6=y" to force CONFIG_IPV6 not to be built as a module. In other words, we can now only have CONFIG_MPTCP_IPV6=y if CONFIG_IPV6=y. Note that the new dependency might hide the fact IPv6 is not supported in MPTCP even if we have CONFIG_IPV6=m. But selecting IPV6 like we did before was forcing it to be built-in while it was maybe not what the user wants. Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Fixes: 010b430d ("mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it") Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20201021105154.628257-1-matthieu.baerts@tessares.netSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alexander Ovechkin 提交于
mpls_iptunnel is used only for mpls encapsuation, and if encaplusated packet is larger than MTU we need mpls_gso for segmentation. Signed-off-by: NAlexander Ovechkin <ovov@yandex-team.ru> Acked-by: NDmitry Yakunin <zeil@yandex-team.ru> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/r/20201020114333.26866-1-ovov@yandex-team.ruSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Davide Caratti 提交于
the following command # tc action add action tunnel_key \ > set src_ip 2001:db8::1 dst_ip 2001:db8::2 id 10 erspan_opts 1:6789:0:0 generates the following splat: BUG: KASAN: slab-out-of-bounds in tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key] Write of size 4 at addr ffff88813f5f1cc8 by task tc/873 CPU: 2 PID: 873 Comm: tc Not tainted 5.9.0+ #282 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 Call Trace: dump_stack+0x99/0xcb print_address_description.constprop.7+0x1e/0x230 kasan_report.cold.13+0x37/0x7c tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key] tunnel_key_init+0x160c/0x1f40 [act_tunnel_key] tcf_action_init_1+0x5b5/0x850 tcf_action_init+0x15d/0x370 tcf_action_add+0xd9/0x2f0 tc_ctl_action+0x29b/0x3a0 rtnetlink_rcv_msg+0x341/0x8d0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f872a96b338 Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 RSP: 002b:00007ffffe367518 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000005f8f5aed RCX: 00007f872a96b338 RDX: 0000000000000000 RSI: 00007ffffe367580 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000001c R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000686760 R14: 0000000000000601 R15: 0000000000000000 Allocated by task 873: kasan_save_stack+0x19/0x40 __kasan_kmalloc.constprop.7+0xc1/0xd0 __kmalloc+0x151/0x310 metadata_dst_alloc+0x20/0x40 tunnel_key_init+0xfff/0x1f40 [act_tunnel_key] tcf_action_init_1+0x5b5/0x850 tcf_action_init+0x15d/0x370 tcf_action_add+0xd9/0x2f0 tc_ctl_action+0x29b/0x3a0 rtnetlink_rcv_msg+0x341/0x8d0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88813f5f1c00 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 200 bytes inside of 256-byte region [ffff88813f5f1c00, ffff88813f5f1d00) The buggy address belongs to the page: page:0000000011b48a19 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13f5f0 head:0000000011b48a19 order:1 compound_mapcount:0 flags: 0x17ffffc0010200(slab|head) raw: 0017ffffc0010200 0000000000000000 0000000d00000001 ffff888107c43400 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88813f5f1b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88813f5f1c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88813f5f1c80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc ^ ffff88813f5f1d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88813f5f1d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc using IPv6 tunnels, act_tunnel_key allocates a fixed amount of memory for the tunnel metadata, but then it expects additional bytes to store tunnel specific metadata with tunnel_key_copy_opts(). Fix the arguments of __ipv6_tun_set_dst(), so that 'md_size' contains the size previously computed by tunnel_key_get_opts_len(), like it's done for IPv4 tunnels. Fixes: 0ed5269f ("net/sched: add tunnel option support to act_tunnel_key") Reported-by: NShuang Li <shuali@redhat.com> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/36ebe969f6d13ff59912d6464a4356fe6f103766.1603231100.git.dcaratti@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Guillaume Nault 提交于
We need to jump to the "err_out_locked" label when tcf_gate_get_entries() fails. Otherwise, tc_setup_flow_action() exits with ->tcfa_lock still held. Fixes: d29bdd69 ("net: schedule: add action gate offloading") Signed-off-by: NGuillaume Nault <gnault@redhat.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/12f60e385584c52c22863701c0185e40ab08a7a7.1603207948.git.gnault@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Geert Uytterhoeven 提交于
MPTCP_IPV6 selects IPV6, thus enabling an optional feature the user may not want to enable. Fix this by making MPTCP_IPV6 depend on IPV6, like is done for all other IPv6 features. Fixes: f870fa0b ("mptcp: Add MPTCP socket stubs") Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20201020073839.29226-1-geert@linux-m68k.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Defang Bo 提交于
Check that the NFC_ATTR_FIRMWARE_NAME attributes are provided by the netlink client prior to accessing them.This prevents potential unhandled NULL pointer dereference exceptions which can be triggered by malicious user-mode programs, if they omit one or both of these attributes. Similar to commit a0323b97 ("nfc: Ensure presence of required attributes in the activate_target handler"). Fixes: 9674da87 ("NFC: Add firmware upload netlink command") Signed-off-by: NDefang Bo <bodefang@126.com> Link: https://lore.kernel.org/r/1603107538-4744-1-git-send-email-bodefang@126.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Geert Uytterhoeven 提交于
MPTCP_KUNIT_TESTS selects MPTCP, thus enabling an optional feature the user may not want to enable. Fix this by making the test depend on MPTCP instead. Fixes: a00a5822 ("mptcp: move crypto test to KUNIT") Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20201019113240.11516-1-geert@linux-m68k.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Geliang Tang 提交于
Move mptcp_options_received's port initialization from mptcp_parse_option to mptcp_get_options, put it together with the other fields initializations of mptcp_options_received. Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Geliang Tang 提交于
Initialize mptcp_options_received's ahmac to zero, otherwise it will be a random number when receiving ADD_ADDR suboption with echo-flag=1. Fixes: 3df523ab ("mptcp: Add ADD_ADDR handling") Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Roi Dayan 提交于
Need to use the udp header type and not tcp. Fixes: 9c26ba9b ("net/sched: act_ct: Instantiate flow table entry actions") Signed-off-by: NRoi Dayan <roid@nvidia.com> Reviewed-by: NPaul Blakey <paulb@nvidia.com> Link: https://lore.kernel.org/r/20201019090244.3015186-1-roid@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Martijn de Gouw 提交于
When the passed token is longer than 4032 bytes, the remaining part of the token must be copied from the rqstp->rq_arg.pages. But the copy must make sure it happens in a consecutive way. With the existing code, the first memcpy copies 'length' bytes from argv->iobase, but since the header is in front, this never fills the whole first page of in_token->pages. The mecpy in the loop copies the following bytes, but starts writing at the next page of in_token->pages. This leaves the last bytes of page 0 unwritten. Symptoms were that users with many groups were not able to access NFS exports, when using Active Directory as the KDC. Signed-off-by: NMartijn de Gouw <martijn.de.gouw@prodrive-technologies.com> Fixes: 5866efa8 "SUNRPC: Fix svcauth_gss_proxy_init()" Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
Its possible that using AUTH_SYS and mountd manage-gids option a user may hit the 8k RPC channel buffer limit. This have been observed on field, causing unanswered RPCs on clients after mountd fails to write on channel : rpc.mountd[11231]: auth_unix_gid: error writing reply Userland nfs-utils uses a buffer size of 32k (RPC_CHAN_BUF_SIZE), so lets match those two. Signed-off-by: NRoberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 20 10月, 2020 7 次提交
-
-
由 Saeed Mirzamohammadi 提交于
This patch fixes the issue due to: BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2 net/netfilter/nf_tables_offload.c:40 Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244 The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds. This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue. Add nft_expr_more() and use it to fix this problem. Signed-off-by: NSaeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Timothée COCAULT 提交于
Fixes an error causing small packets to get dropped. skb_ensure_writable expects the second parameter to be a length in the ethernet payload.=20 If we want to write the ethernet header (src, dst), we should pass 0. Otherwise, packets with small payloads (< ETH_ALEN) will get dropped. Fixes: c1a83116 ("netfilter: bridge: convert skb_make_writable to skb_ensure_writable") Signed-off-by: NTimothée COCAULT <timothee.cocault@orange.com> Reviewed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Georg Kohmann 提交于
Fragmented ndisc packets assembled in netfilter not dropped as specified in RFC 6980, section 5. This behaviour breaks TAHI IPv6 Core Conformance Tests v6LC.2.1.22/23, V6LC.2.2.26/27 and V6LC.2.3.18. Setting IP6SKB_FRAGMENTED flag during reassembly. References: commit b800c3b9 ("ipv6: drop fragmented ndisc packets by default (RFC 6980)") Signed-off-by: NGeorg Kohmann <geokohma@cisco.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Francesco Ruggeri 提交于
If the first packet conntrack sees after a re-register is an outgoing keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to SND.NXT-1. When the peer correctly acknowledges SND.NXT, tcp_in_window fails check III (Upper bound for valid (s)ack: sack <= receiver.td_end) and returns false, which cascades into nf_conntrack_in setting skb->_nfct = 0 and in later conntrack iptables rules not matching. In cases where iptables are dropping packets that do not match conntrack rules this can result in idle tcp connections to time out. v2: adjust td_end when getting the reply rather than when sending out the keepalive packet. Fixes: f94e6380 ("netfilter: conntrack: reset tcp maxwin on re-register") Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 longguang.yue 提交于
Outputting client,virtual,dst addresses info when tcp state changes, which makes the connection debug more clear Signed-off-by: Nlongguang.yue <bigclouds@163.com> Acked-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Ido Schimmel 提交于
While insertion of 16k nexthops all using the same netdev ('dummy10') takes less than a second, deletion takes about 130 seconds: # time -p ip -b nexthop.batch real 0.29 user 0.01 sys 0.15 # time -p ip link set dev dummy10 down real 131.03 user 0.06 sys 0.52 This is because of repeated calls to synchronize_rcu() whenever a nexthop is removed from a nexthop group: # /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K ... b'finish_task_switch' b'schedule' b'schedule_timeout' b'wait_for_completion' b'__wait_rcu_gp' b'synchronize_rcu.part.0' b'synchronize_rcu' b'__remove_nexthop' b'remove_nexthop' b'nexthop_flush_dev' b'nh_netdev_event' b'raw_notifier_call_chain' b'call_netdevice_notifiers_info' b'__dev_notify_flags' b'dev_change_flags' b'do_setlink' b'__rtnl_newlink' b'rtnl_newlink' b'rtnetlink_rcv_msg' b'netlink_rcv_skb' b'rtnetlink_rcv' b'netlink_unicast' b'netlink_sendmsg' b'____sys_sendmsg' b'___sys_sendmsg' b'__sys_sendmsg' b'__x64_sys_sendmsg' b'do_syscall_64' b'entry_SYSCALL_64_after_hwframe' - ip (277) 126554955 Since nexthops are always deleted under RTNL, synchronize_net() can be used instead. It will call synchronize_rcu_expedited() which only blocks for several microseconds as opposed to multiple milliseconds like synchronize_rcu(). With this patch deletion of 16k nexthops takes less than a second: # time -p ip link set dev dummy10 down real 0.12 user 0.00 sys 0.04 Tested with fib_nexthops.sh which includes torture tests that prompted the initial change: # ./fib_nexthops.sh ... Tests passed: 134 Tests failed: 0 Fixes: 90f33bff ("nexthops: don't modify published nexthop groups") Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20201016172914.643282-1-idosch@idosch.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Christian Eggers 提交于
The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and KSZ9893 switches also use tail tags. Fixes: 7a6ffe76 ("net: dsa: point out the tail taggers") Signed-off-by: NChristian Eggers <ceggers@arri.de> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201016171603.10587-1-ceggers@arri.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 19 10月, 2020 2 次提交
-
-
由 Taehee Yoo 提交于
dev->unlink_list is reused unless dev is deleted. So, list_del() should not be used. Due to using list_del(), dev->unlink_list can't be reused so that dev->nested_level update logic doesn't work. In order to fix this bug, list_del_init() should be used instead of list_del(). Test commands: ip link add bond0 type bond ip link add bond1 type bond ip link set bond0 master bond1 ip link set bond0 nomaster ip link set bond1 master bond0 ip link set bond1 nomaster Splat looks like: [ 255.750458][ T1030] ============================================ [ 255.751967][ T1030] WARNING: possible recursive locking detected [ 255.753435][ T1030] 5.9.0-rc8+ #772 Not tainted [ 255.754553][ T1030] -------------------------------------------- [ 255.756047][ T1030] ip/1030 is trying to acquire lock: [ 255.757304][ T1030] ffff88811782a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync_multiple+0xc2/0x150 [ 255.760056][ T1030] [ 255.760056][ T1030] but task is already holding lock: [ 255.761862][ T1030] ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding] [ 255.764581][ T1030] [ 255.764581][ T1030] other info that might help us debug this: [ 255.766645][ T1030] Possible unsafe locking scenario: [ 255.766645][ T1030] [ 255.768566][ T1030] CPU0 [ 255.769415][ T1030] ---- [ 255.770259][ T1030] lock(&dev_addr_list_lock_key/1); [ 255.771629][ T1030] lock(&dev_addr_list_lock_key/1); [ 255.772994][ T1030] [ 255.772994][ T1030] *** DEADLOCK *** [ 255.772994][ T1030] [ 255.775091][ T1030] May be due to missing lock nesting notation [ 255.775091][ T1030] [ 255.777182][ T1030] 2 locks held by ip/1030: [ 255.778299][ T1030] #0: ffffffffb1f63250 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x2e4/0x8b0 [ 255.780600][ T1030] #1: ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding] [ 255.783411][ T1030] [ 255.783411][ T1030] stack backtrace: [ 255.784874][ T1030] CPU: 7 PID: 1030 Comm: ip Not tainted 5.9.0-rc8+ #772 [ 255.786595][ T1030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 255.789030][ T1030] Call Trace: [ 255.789850][ T1030] dump_stack+0x99/0xd0 [ 255.790882][ T1030] __lock_acquire.cold.71+0x166/0x3cc [ 255.792285][ T1030] ? register_lock_class+0x1a30/0x1a30 [ 255.793619][ T1030] ? rcu_read_lock_sched_held+0x91/0xc0 [ 255.794963][ T1030] ? rcu_read_lock_bh_held+0xa0/0xa0 [ 255.796246][ T1030] lock_acquire+0x1b8/0x850 [ 255.797332][ T1030] ? dev_mc_sync_multiple+0xc2/0x150 [ 255.798624][ T1030] ? bond_enslave+0x3d4d/0x43e0 [bonding] [ 255.800039][ T1030] ? check_flags+0x50/0x50 [ 255.801143][ T1030] ? lock_contended+0xd80/0xd80 [ 255.802341][ T1030] _raw_spin_lock_nested+0x2e/0x70 [ 255.803592][ T1030] ? dev_mc_sync_multiple+0xc2/0x150 [ 255.804897][ T1030] dev_mc_sync_multiple+0xc2/0x150 [ 255.806168][ T1030] bond_enslave+0x3d58/0x43e0 [bonding] [ 255.807542][ T1030] ? __lock_acquire+0xe53/0x51b0 [ 255.808824][ T1030] ? bond_update_slave_arr+0xdc0/0xdc0 [bonding] [ 255.810451][ T1030] ? check_chain_key+0x236/0x5e0 [ 255.811742][ T1030] ? mutex_is_locked+0x13/0x50 [ 255.812910][ T1030] ? rtnl_is_locked+0x11/0x20 [ 255.814061][ T1030] ? netdev_master_upper_dev_get+0xf/0x120 [ 255.815553][ T1030] do_setlink+0x94c/0x3040 [ ... ] Reported-by: syzbot+4a0f7bc34e3997a6c7df@syzkaller.appspotmail.com Fixes: 1fc70edb ("net: core: add nested_level variable in net_device") Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20201015162606.9377-1-ap420073@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Eelco Chaudron 提交于
The flow_lookup() function uses per CPU variables, which must be called with BH disabled. However, this is fine in the general NAPI use case where the local BH is disabled. But, it's also called from the netlink context. The below patch makes sure that even in the netlink path, the BH is disabled. In addition, u64_stats_update_begin() requires a lock to ensure one writer which is not ensured here. Making it per-CPU and disabling NAPI (softirq) ensures that there is always only one writer. Fixes: eac87c41 ("net: openvswitch: reorder masks array based on usage") Reported-by: NJuri Lelli <jlelli@redhat.com> Signed-off-by: NEelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160295903253.7789.826736662555102345.stgit@ebuildSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 17 10月, 2020 5 次提交
-
-
由 Eric Dumazet 提交于
Keyu Man reported that the ICMP rate limiter could be used by attackers to get useful signal. Details will be provided in an upcoming academic publication. Our solution is to add some noise, so that the attackers no longer can get help from the predictable token bucket limiter. Fixes: 4cdf507d ("icmp: add a global rate limitation") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NKeyu Man <kman001@ucr.edu> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Hoang Huu Le 提交于
In commit 16ad3f40 ("tipc: introduce variable window congestion control"), we applied the algorithm to select window size from minimum window to the configured maximum window for unicast link, and, besides we chose to keep the window size for broadcast link unchanged and equal (i.e fix window 50) However, when setting maximum window variable via command, the window variable was re-initialized to unexpect value (i.e 32). We fix this by updating the fix window for broadcast as we stated. Fixes: 16ad3f40 ("tipc: introduce variable window congestion control") Acked-by: NJon Maloy <jmaloy@redhat.com> Signed-off-by: NHoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Hoang Huu Le 提交于
The queue limit of the broadcast link is being calculated base on initial MTU. However, when MTU value changed (e.g manual changing MTU on NIC device, MTU negotiation etc.,) we do not re-calculate queue limit. This gives throughput does not reflect with the change. So fix it by calling the function to re-calculate queue limit of the broadcast link. Acked-by: NJon Maloy <jmaloy@redhat.com> Signed-off-by: NHoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Dan Aloni 提交于
This was discovered using O_DIRECT at the client side, with small unaligned file offsets or IOs that span multiple file pages. Fixes: e248aa7b ("svcrdma: Remove max_sge check at connect time") Signed-off-by: NDan Aloni <dan@kernelim.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Artur Molchanov 提交于
Fix returning value for sysctl sunrpc.transports. Return error code from sysctl proc_handler function proc_do_xprt instead of number of the written bytes. Otherwise sysctl returns random garbage for this key. Since v1: - Handle negative returned value from memory_read_from_buffer as an error Signed-off-by: NArtur Molchanov <arturmolchanov@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 16 10月, 2020 7 次提交
-
-
由 Jakub Kicinski 提交于
This reverts commit 1d273fcc. Alexei points out there's nothing implying headers will be built and therefore exist under usr/include, so this fix doesn't make much sense. Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alex Dewar 提交于
If bpf_prog_inc_not_zero() fails for skb_parser, then bpf_prog_put() is called unconditionally on skb_verdict, even though it may be NULL. Fix and tidy up error path. Fixes: 743df8b7 ("bpf, sockmap: Check skb_verdict and skb_parser programs explicitly") Addresses-Coverity-ID: 1497799: Null pointer dereferences (FORWARD_NULL) Signed-off-by: NAlex Dewar <alex.dewar90@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJakub Sitnicki <jakub@cloudflare.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20201012170952.60750-1-alex.dewar90@gmail.com
-
由 Lorenz Bauer 提交于
The sparse checker currently outputs the following warnings: include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_hash_seq_start' - wrong count at exit include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_map_seq_start' - wrong count at exit Add the necessary __acquires and __release annotations to make the iterator locking schema palatable to sparse. Also add __must_hold for good measure. The kernel codebase uses both __acquires(rcu) and __acquires(RCU). I couldn't find any guidance which one is preferred, so I used what is easier to type out. Fixes: 03653515 ("net: Allow iterating sockmap and sockhash") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NLorenz Bauer <lmb@cloudflare.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20201012091850.67452-1-lmb@cloudflare.com
-
由 Davide Caratti 提交于
nftables payload statements are used to mangle SCTP headers, but they can only replace the Internet Checksum. As a consequence, nftables rules that mangle sport/dport/vtag in SCTP headers potentially generate packets that are discarded by the receiver, unless the CRC-32C is "offloaded" (e.g the rule mangles a skb having 'ip_summed' equal to 'CHECKSUM_PARTIAL'. Fix this extending uAPI definitions and L4 checksum update function, in a way that userspace programs (e.g. nft) can instruct the kernel to compute CRC-32C in SCTP headers. Also ensure that LIBCRC32C is built if NF_TABLES is 'y' or 'm' in the kernel build configuration. Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Yonghong Song 提交于
Commit 4fc427e0 ("ipv6_route_seq_next should increase position index") tried to fix the issue where seq_file pos is not increased if a NULL element is returned with seq_ops->next(). See bug https://bugzilla.kernel.org/show_bug.cgi?id=206283 The commit effectively does: - increase pos for all seq_ops->start() - increase pos for all seq_ops->next() For ipv6_route, increasing pos for all seq_ops->next() is correct. But increasing pos for seq_ops->start() is not correct since pos is used to determine how many items to skip during seq_ops->start(): iter->skip = *pos; seq_ops->start() just fetches the *current* pos item. The item can be skipped only after seq_ops->show() which essentially is the beginning of seq_ops->next(). For example, I have 7 ipv6 route entries, root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096 00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0 fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 0+1 records in 0+1 records out 1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s root@arch-fb-vm1:~/net-next In the above, I specify buffer size 4096, so all records can be returned to user space with a single trip to the kernel. If I use buffer size 128, since each record size is 149, internally kernel seq_read() will read 149 into its internal buffer and return the data to user space in two read() syscalls. Then user read() syscall will trigger next seq_ops->start(). Since the current implementation increased pos even for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first record is #1. root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128 00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 4+1 records in 4+1 records out 600 bytes copied, 0.00127758 s, 470 kB/s To fix the problem, create a fake pos pointer so seq_ops->start() won't actually increase seq_file pos. With this fix, the above `dd` command with `bs=128` will show correct result. Fixes: 4fc427e0 ("ipv6_route_seq_next should increase position index") Cc: Alexei Starovoitov <ast@kernel.org> Suggested-by: NVasily Averin <vvs@virtuozzo.com> Reviewed-by: NVasily Averin <vvs@virtuozzo.com> Signed-off-by: NYonghong Song <yhs@fb.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Karsten Graul 提交于
smc_ism_register_dmb() returns error codes set by the ISM driver which are not guaranteed to be negative or in the errno range. Such values would not be handled by ERR_PTR() and finally the return code will be used as a memory address. Fix that by using a valid negative errno value with ERR_PTR(). Fixes: 72b7f6c4 ("net/smc: unique reason code for exceeded max dmb count") Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Karsten Graul 提交于
The SMCD_DMBE_SIZES should include all valid DMBE buffer sizes, so the correct value is 6 which means 1MB. With 7 the registration of an ISM buffer would always fail because of the invalid size requested. Fix that and set the value to 6. Fixes: c6ba7c9b ("net/smc: add base infrastructure for SMC-D and ISM") Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-