- 17 2月, 2020 5 次提交
-
-
由 Matteo Croce 提交于
New action to decrement TTL instead of setting it to a fixed value. This action will decrement the TTL and, in case of expired TTL, drop it or execute an action passed via a nested attribute. The default TTL expired action is to drop the packet. Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively. Tested with a corresponding change in the userspace: # ovs-dpctl dump-flows in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},1 in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},2 in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2 in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1 # ping -c1 192.168.0.2 -t 42 IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64 # ping -c1 192.168.0.2 -t 120 IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64 # ping -c1 192.168.0.2 -t 1 # Co-developed-by: NBindiya Kurle <bindiyakurle@gmail.com> Signed-off-by: NBindiya Kurle <bindiyakurle@gmail.com> Signed-off-by: NMatteo Croce <mcroce@redhat.com> Acked-by: NPravin B Shelar <pshelar@ovn.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arjun Roy 提交于
This patchset is intended to reduce the number of extra system calls imposed by TCP receive zerocopy. For ping-pong RPC style workloads, this patchset has demonstrated a system call reduction of about 30% when coupled with userspace changes. For applications using epoll, returning sk_err along with the result of tcp receive zerocopy could remove the need to call recvmsg()=-EAGAIN after a spurious wakeup. Consider a multi-threaded application using epoll. A thread may awaken with EPOLLIN but another thread may already be reading. The spuriously-awoken thread does not necessarily know that another thread 'won'; rather, it may be possible that it was woken up due to the presence of an error if there is no data. A zerocopy read receiving 0 bytes thus would need to be followed up by recvmsg to be sure. Instead, we return sk_err directly with zerocopy, so the application can avoid this extra system call. Signed-off-by: NArjun Roy <arjunroy@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arjun Roy 提交于
This patchset is intended to reduce the number of extra system calls imposed by TCP receive zerocopy. For ping-pong RPC style workloads, this patchset has demonstrated a system call reduction of about 30% when coupled with userspace changes. For applications using edge-triggered epoll, returning inq along with the result of tcp receive zerocopy could remove the need to call recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking, since normally we would need to perform a recvmsg() call for every successful small RPC read via TCP receive zerocopy, returning inq can reduce the number of system calls performed by approximately half. Signed-off-by: NArjun Roy <arjunroy@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sebastien Boeuf 提交于
Whenever the vsock backend on the host sends a packet through the RX queue, it expects an answer on the TX queue. Unfortunately, there is one case where the host side will hang waiting for the answer and might effectively never recover if no timeout mechanism was implemented. This issue happens when the guest side starts binding to the socket, which insert a new bound socket into the list of already bound sockets. At this time, we expect the guest to also start listening, which will trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem occurs if the host side queued a RX packet and triggered an interrupt right between the end of the binding process and the beginning of the listening process. In this specific case, the function processing the packet virtio_transport_recv_pkt() will find a bound socket, which means it will hit the switch statement checking for the sk_state, but the state won't be changed into TCP_LISTEN yet, which leads the code to pick the default statement. This default statement will only free the buffer, while it should also respond to the host side, by sending a packet on its TX queue. In order to simply fix this unfortunate chain of events, it is important that in case the default statement is entered, and because at this stage we know the host side is waiting for an answer, we must send back a packet containing the operation VIRTIO_VSOCK_OP_RST. One could say that a proper timeout mechanism on the host side will be enough to avoid the backend to hang. But the point of this patch is to ensure the normal use case will be provided with proper responsiveness when it comes to establishing the connection. Signed-off-by: NSebastien Boeuf <sebastien.boeuf@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 chenqiwu 提交于
Use list_for_each_entry_safe() instead of list_for_each_safe() to simplify the code. Signed-off-by: Nchenqiwu <chenqiwu@xiaomi.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 2月, 2020 14 次提交
-
-
由 Per Forlin 提交于
Passing tag size to skb_cow_head will make sure there is enough headroom for the tag data. This change does not introduce any overhead in case there is already available headroom for tag. Signed-off-by: NPer Forlin <perfn@axis.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Per Forlin 提交于
Passing tag size to skb_cow_head will make sure there is enough headroom for the tag data. This change does not introduce any overhead in case there is already available headroom for tag. Signed-off-by: NPer Forlin <perfn@axis.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 William Dauchy 提交于
With ipip, it is possible to create an extra interface explicitly attached to a given physical interface: # ip link show tunl0 4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 # ip link add tunl1 type ipip dev eth0 # ip link show tunl1 6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 But it is not possible with ip6tnl: # ip link show ip6tnl0 5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/tunnel6 :: brd :: # ip link add ip6tnl1 type ip6tnl dev eth0 RTNETLINK answers: File exists This patch aims to make it possible by adding link comparaison in both tunnel locate and lookup functions; we also modify mtu calculation when attached to an interface with a lower mtu. This permits to make use of x-netns communication by moving the newly created tunnel in a given netns. Signed-off-by: NWilliam Dauchy <w.dauchy@criteo.com> Reviewed-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ursula Braun 提交于
Just SMCR requires a CLC Peer ID, but not SMCD. The field should be zero for SMCD. Fixes: c758dfdd ("net/smc: add SMC-D support in CLC messages") Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ursula Braun 提交于
SMC does not work together with FASTOPEN. If sendmsg() is called with flag MSG_FASTOPEN in SMC_INIT state, the SMC-socket switches to fallback mode. To handle the previous ioctl FIOASYNC call correctly in this case, it is necessary to transfer the socket wait queue fasync_list to the internal TCP socket. Reported-by: syzbot+4b1fe8105f8044a26162@syzkaller.appspotmail.com Fixes: ee9dfbef ("net/smc: handle sockopts forcing fallback") Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Crispin 提交于
Currently a mac80211 driver can only set the txq_limit when using wake_tx_queue. Not all drivers use wake_tx_queue. This patch adds a new element to wiphy allowing a driver to set a custom tx_queue_len and the code that will apply it in case it is set. The current default is 1000 which is too low for ath11k when doing HE rates. Signed-off-by: NJohn Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200211122605.13002-1-john@phrozen.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Ben Greear 提交于
With multiple VIFS ath10k, and probably others, tries to find the minimum txpower for all vifs and uses that when setting txpower in the firmware. If a second vif is added and starts to scan, it's txpower is not initialized yet and it set to zero. ath10k had a patch to ignore zero values, but then it is impossible to actually set txpower to zero. So, instead initialize the txpower to INT_MIN in mac80211, and let drivers know that means the power has not been set and so should be ignored. This should fix regression in: commit 88407beb Author: Ryan Hsu <ryanhsu@qca.qualcomm.com> Date: Tue Dec 13 14:55:19 2016 -0800 ath10k: fix incorrect txpower set by P2P_DEVICE interface Tested on ath10k 9984 with ath10k-ct firmware. Signed-off-by: NBen Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20191217183057.24586-1-greearb@candelatech.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Shay Bar 提交于
Before this patch, STA's would set new width of 160/80+80 MHz based on AP capability only. This is wrong because STA may not support > 80MHz BW. Fix is to verify STA has 160/80+80 MHz capability before increasing its width to > 80MHz. The "support_80_80" and "support_160" setting is based on: "Table 9-272 — Setting of the Supported Channel Width Set subfield and Extended NSS BW Support subfield at a STA transmitting the VHT Capabilities Information field" From "Draft P802.11REVmd_D3.0.pdf" Signed-off-by: NAviad Brikman <aviad.brikman@celeno.com> Signed-off-by: NShay Bar <shay.bar@celeno.com> Link: https://lore.kernel.org/r/20200210130728.23674-1-shay.bar@celeno.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Sergey Matyukevich 提交于
The nl80211_policy is missing for NL80211_ATTR_STATUS_CODE attribute. As a result, for strictly validated commands, it's assumed to not be supported. Signed-off-by: NSergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20200213131608.10541-2-sergey.matyukevich.os@quantenna.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Jason A. Donenfeld 提交于
Because xfrmi is calling icmp from network device context, it should use the ndo helper so that the rate limiting applies correctly. Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason A. Donenfeld 提交于
This introduces a helper function to be called only by network drivers that wraps calls to icmp[v6]_send in a conntrack transformation, in case NAT has been used. We don't want to pollute the non-driver path, though, so we introduce this as a helper to be called by places that actually make use of this, as suggested by Florian. Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Davide Caratti 提交于
unlike other classifiers that can be offloaded (i.e. users can set flags like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry to fl_policy. Fixes: 5b33f488 ("net/flower: Introduce hardware offload support") Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Davide Caratti 提交于
unlike other classifiers that can be offloaded (i.e. users can set flags like 'skip_hw' and 'skip_sw'), 'cls_matchall' doesn't validate the size of netlink attribute 'TCA_MATCHALL_FLAGS' provided by user: add a proper entry to mall_policy. Fixes: b87f7936 ("net/sched: Add match-all classifier hw offloading.") Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Li RongQing 提交于
"do {} while" in page_pool_refill_alloc_cache will always refill page once whether refill is true or false, and whether alloc.count of pool is less than PP_ALLOC_CACHE_REFILL or not this is wrong, and will cause overflow of pool->alloc.cache the caller of __page_pool_get_cached should provide guarantee that pool->alloc.cache is safe to access, so in_serving_softirq should be removed as suggested by Jesper Dangaard Brouer in https://patchwork.ozlabs.org/patch/1233713/ so fix this issue by calling page_pool_refill_alloc_cache() only when pool->alloc.count is zero Fixes: 44768dec ("page_pool: handle page recycle for NUMA_NO_NODE condition") Signed-off-by: NLi RongQing <lirongqing@baidu.com> Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 2月, 2020 2 次提交
-
-
由 Eric Dumazet 提交于
As nlmsg_put() does not clear the memory that is reserved, it this the caller responsability to make sure all of this memory will be written, in order to not reveal prior content. While we are at it, we can provide the socket cookie even if clsock is not set. syzbot reported : BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline] BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline] BUG: KMSAN: uninit-value in __swab32p include/uapi/linux/swab.h:179 [inline] BUG: KMSAN: uninit-value in __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline] BUG: KMSAN: uninit-value in get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline] BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline] BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline] BUG: KMSAN: uninit-value in bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252 CPU: 1 PID: 5262 Comm: syz-executor.5 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline] __fswab32 include/uapi/linux/swab.h:59 [inline] __swab32p include/uapi/linux/swab.h:179 [inline] __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline] get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline] ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline] ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline] bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_kmalloc_large+0x73/0xc0 mm/kmsan/kmsan_hooks.c:128 kmalloc_large_node_hook mm/slub.c:1406 [inline] kmalloc_large_node+0x282/0x2c0 mm/slub.c:3841 __kmalloc_node_track_caller+0x44b/0x1200 mm/slub.c:4368 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] netlink_dump+0x44b/0x1ab0 net/netlink/af_netlink.c:2224 __netlink_dump_start+0xbb2/0xcf0 net/netlink/af_netlink.c:2352 netlink_dump_start include/linux/netlink.h:233 [inline] smc_diag_handler_dump+0x2ba/0x300 net/smc/smc_diag.c:242 sock_diag_rcv_msg+0x211/0x610 net/core/sock_diag.c:256 netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477 sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:275 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] kernel_sendmsg+0x433/0x440 net/socket.c:679 sock_no_sendpage+0x235/0x300 net/core/sock.c:2740 kernel_sendpage net/socket.c:3776 [inline] sock_sendpage+0x1e1/0x2c0 net/socket.c:937 pipe_to_sendpage+0x38c/0x4c0 fs/splice.c:458 splice_from_pipe_feed fs/splice.c:512 [inline] __splice_from_pipe+0x539/0xed0 fs/splice.c:636 splice_from_pipe fs/splice.c:671 [inline] generic_splice_sendpage+0x1d5/0x2d0 fs/splice.c:844 do_splice_from fs/splice.c:863 [inline] do_splice fs/splice.c:1170 [inline] __do_sys_splice fs/splice.c:1447 [inline] __se_sys_splice+0x2380/0x3350 fs/splice.c:1427 __x64_sys_splice+0x6e/0x90 fs/splice.c:1427 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: f16a7dd5 ("smc: netlink interface for SMC sockets") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Toke Høiland-Jørgensen 提交于
The current generic XDP handler skips execution of XDP programs entirely if an SKB is marked as cloned. This leads to some surprising behaviour, as packets can end up being cloned in various ways, which will make an XDP program not see all the traffic on an interface. This was discovered by a simple test case where an XDP program that always returns XDP_DROP is installed on a veth device. When combining this with the Scapy packet sniffer (which uses an AF_PACKET) socket on the sending side, SKBs reliably end up in the cloned state, causing them to be passed through to the receiving interface instead of being dropped. A minimal reproducer script for this is included below. This patch fixed the issue by simply triggering the existing linearisation code for cloned SKBs instead of skipping the XDP program execution. This behaviour is in line with the behaviour of the native XDP implementation for the veth driver, which will reallocate and copy the SKB data if the SKB is marked as shared. Reproducer Python script (requires BCC and Scapy): from scapy.all import TCP, IP, Ether, sendp, sniff, AsyncSniffer, Raw, UDP from bcc import BPF import time, sys, subprocess, shlex SKB_MODE = (1 << 1) DRV_MODE = (1 << 2) PYTHON=sys.executable def client(): time.sleep(2) # Sniffing on the sender causes skb_cloned() to be set s = AsyncSniffer() s.start() for p in range(10): sendp(Ether(dst="aa:aa:aa:aa:aa:aa", src="cc:cc:cc:cc:cc:cc")/IP()/UDP()/Raw("Test"), verbose=False) time.sleep(0.1) s.stop() return 0 def server(mode): prog = BPF(text="int dummy_drop(struct xdp_md *ctx) {return XDP_DROP;}") func = prog.load_func("dummy_drop", BPF.XDP) prog.attach_xdp("a_to_b", func, mode) time.sleep(1) s = sniff(iface="a_to_b", count=10, timeout=15) if len(s): print(f"Got {len(s)} packets - should have gotten 0") return 1 else: print("Got no packets - as expected") return 0 if len(sys.argv) < 2: print(f"Usage: {sys.argv[0]} <skb|drv>") sys.exit(1) if sys.argv[1] == "client": sys.exit(client()) elif sys.argv[1] == "server": mode = SKB_MODE if sys.argv[2] == 'skb' else DRV_MODE sys.exit(server(mode)) else: try: mode = sys.argv[1] if mode not in ('skb', 'drv'): print(f"Usage: {sys.argv[0]} <skb|drv>") sys.exit(1) print(f"Running in {mode} mode") for cmd in [ 'ip netns add netns_a', 'ip netns add netns_b', 'ip -n netns_a link add a_to_b type veth peer name b_to_a netns netns_b', # Disable ipv6 to make sure there's no address autoconf traffic 'ip netns exec netns_a sysctl -qw net.ipv6.conf.a_to_b.disable_ipv6=1', 'ip netns exec netns_b sysctl -qw net.ipv6.conf.b_to_a.disable_ipv6=1', 'ip -n netns_a link set dev a_to_b address aa:aa:aa:aa:aa:aa', 'ip -n netns_b link set dev b_to_a address cc:cc:cc:cc:cc:cc', 'ip -n netns_a link set dev a_to_b up', 'ip -n netns_b link set dev b_to_a up']: subprocess.check_call(shlex.split(cmd)) server = subprocess.Popen(shlex.split(f"ip netns exec netns_a {PYTHON} {sys.argv[0]} server {mode}")) client = subprocess.Popen(shlex.split(f"ip netns exec netns_b {PYTHON} {sys.argv[0]} client")) client.wait() server.wait() sys.exit(server.returncode) finally: subprocess.run(shlex.split("ip netns delete netns_a")) subprocess.run(shlex.split("ip netns delete netns_b")) Fixes: d4455169 ("net: xdp: support xdp generic on virtual devices") Reported-by: NStepan Horacek <shoracek@redhat.com> Suggested-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 2月, 2020 3 次提交
-
-
由 Tuong Lien 提交于
In commit 9546a0b7 ("tipc: fix wrong connect() return code"), we fixed the issue with the 'connect()' that returns zero even though the connecting has failed by waiting for the connection to be 'ESTABLISHED' really. However, the approach has one drawback in conjunction with our 'lightweight' connection setup mechanism that the following scenario can happen: (server) (client) +- accept()| | wait_for_conn() | | |connect() -------+ | |<-------[SYN]---------| > sleeping | | *CONNECTING | |--------->*ESTABLISHED | | |--------[ACK]-------->*ESTABLISHED > wakeup() send()|--------[DATA]------->|\ > wakeup() send()|--------[DATA]------->| | > wakeup() . . . . |-> recvq . . . . . | . send()|--------[DATA]------->|/ > wakeup() close()|--------[FIN]-------->*DISCONNECTING | *DISCONNECTING | | | ~~~~~~~~~~~~~~~~~~> schedule() | wait again . . | ETIMEDOUT Upon the receipt of the server 'ACK', the client becomes 'ESTABLISHED' and the 'wait_for_conn()' process is woken up but not run. Meanwhile, the server starts to send a number of data following by a 'close()' shortly without waiting any response from the client, which then forces the client socket to be 'DISCONNECTING' immediately. When the wait process is switched to be running, it continues to wait until the timer expires because of the unexpected socket state. The client 'connect()' will finally get ‘-ETIMEDOUT’ and force to release the socket whereas there remains the messages in its receive queue. Obviously the issue would not happen if the server had some delay prior to its 'close()' (or the number of 'DATA' messages is large enough), but any kind of delay would make the connection setup/shutdown "heavy". We solve this by simply allowing the 'connect()' returns zero in this particular case. The socket is already 'DISCONNECTING', so any further write will get '-EPIPE' but the socket is still able to read the messages existing in its receive queue. Note: This solution doesn't break the previous one as it deals with a different situation that the socket state is 'DISCONNECTING' but has no error (i.e. sk->sk_err = 0). Fixes: 9546a0b7 ("tipc: fix wrong connect() return code") Acked-by: NYing Xue <ying.xue@windriver.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chen Wandun 提交于
Fix the following sparse warning: net/mptcp/protocol.c:646:13: warning: symbol 'mptcp_sk_clone_lock' was not declared. Should it be static? Fixes: b0519de8 ("mptcp: fix use-after-free for ipv6") Signed-off-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chen Wandun 提交于
Fix the following sparse warning: net/tipc/node.c:281:6: warning: symbol 'tipc_node_free' was not declared. Should it be static? net/tipc/node.c:2801:5: warning: symbol '__tipc_nl_node_set_key' was not declared. Should it be static? net/tipc/node.c:2878:5: warning: symbol '__tipc_nl_node_flush_key' was not declared. Should it be static? Fixes: fc1b6d6d ("tipc: introduce TIPC encryption & authentication") Fixes: e1f32190 ("tipc: add support for AEAD key setting via netlink") Signed-off-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 2月, 2020 12 次提交
-
-
由 Martin KaFai Lau 提交于
It was reported that the max_t, ilog2, and roundup_pow_of_two macros have exponential effects on the number of states in the sparse checker. This patch breaks them up by calculating the "nbuckets" first so that the "bucket_log" only needs to take ilog2(). In addition, Linus mentioned: Patch looks good, but I'd like to point out that it's not just sparse. You can see it with a simple make net/core/bpf_sk_storage.i grep 'smap->bucket_log = ' net/core/bpf_sk_storage.i | wc and see the end result: 1 365071 2686974 That's one line (the assignment line) that is 2,686,974 characters in length. Now, sparse does happen to react particularly badly to that (I didn't look to why, but I suspect it's just that evaluating all the types that don't actually ever end up getting used ends up being much more expensive than it should be), but I bet it's not good for gcc either. Fixes: 6ac99e8f ("bpf: Introduce bpf sk local storage") Reported-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com> Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com> Link: https://lore.kernel.org/bpf/20200207081810.3918919-1-kafai@fb.com
-
由 Jakub Sitnicki 提交于
We need to have a synchronize_rcu before free'ing the sockhash because any outstanding psock references will have a pointer to the map and when they use it, this could trigger a use after free. This is a sister fix for sockhash, following commit 2bb90e5c ("bpf: sockmap, synchronize_rcu before free'ing map") which addressed sockmap, which comes from a manual audit. Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200206111652.694507-3-jakub@cloudflare.com
-
由 Jakub Sitnicki 提交于
rcu_read_lock is needed to protect access to psock inside sock_map_unref when tearing down the map. However, we can't afford to sleep in lock_sock while in RCU read-side critical section. Grab the RCU lock only after we have locked the socket. This fixes RCU warnings triggerable on a VM with 1 vCPU when free'ing a sockmap/sockhash that contains at least one socket: | ============================= | WARNING: suspicious RCU usage | 5.5.0-04005-g8fc91b97 #450 Not tainted | ----------------------------- | include/linux/rcupdate.h:272 Illegal context switch in RCU read-side critical section! | | other info that might help us debug this: | | | rcu_scheduler_active = 2, debug_locks = 1 | 4 locks held by kworker/0:1/62: | #0: ffff88813b019748 ((wq_completion)events){+.+.}, at: process_one_work+0x1d7/0x5e0 | #1: ffffc900000abe50 ((work_completion)(&map->work)){+.+.}, at: process_one_work+0x1d7/0x5e0 | #2: ffffffff82065d20 (rcu_read_lock){....}, at: sock_map_free+0x5/0x170 | #3: ffff8881368c5df8 (&stab->lock){+...}, at: sock_map_free+0x64/0x170 | | stack backtrace: | CPU: 0 PID: 62 Comm: kworker/0:1 Not tainted 5.5.0-04005-g8fc91b97 #450 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 | Workqueue: events bpf_map_free_deferred | Call Trace: | dump_stack+0x71/0xa0 | ___might_sleep+0x105/0x190 | lock_sock_nested+0x28/0x90 | sock_map_free+0x95/0x170 | bpf_map_free_deferred+0x58/0x80 | process_one_work+0x260/0x5e0 | worker_thread+0x4d/0x3e0 | kthread+0x108/0x140 | ? process_one_work+0x5e0/0x5e0 | ? kthread_park+0x90/0x90 | ret_from_fork+0x3a/0x50 | ============================= | WARNING: suspicious RCU usage | 5.5.0-04005-g8fc91b97-dirty #452 Not tainted | ----------------------------- | include/linux/rcupdate.h:272 Illegal context switch in RCU read-side critical section! | | other info that might help us debug this: | | | rcu_scheduler_active = 2, debug_locks = 1 | 4 locks held by kworker/0:1/62: | #0: ffff88813b019748 ((wq_completion)events){+.+.}, at: process_one_work+0x1d7/0x5e0 | #1: ffffc900000abe50 ((work_completion)(&map->work)){+.+.}, at: process_one_work+0x1d7/0x5e0 | #2: ffffffff82065d20 (rcu_read_lock){....}, at: sock_hash_free+0x5/0x1d0 | #3: ffff888139966e00 (&htab->buckets[i].lock){+...}, at: sock_hash_free+0x92/0x1d0 | | stack backtrace: | CPU: 0 PID: 62 Comm: kworker/0:1 Not tainted 5.5.0-04005-g8fc91b97-dirty #452 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 | Workqueue: events bpf_map_free_deferred | Call Trace: | dump_stack+0x71/0xa0 | ___might_sleep+0x105/0x190 | lock_sock_nested+0x28/0x90 | sock_hash_free+0xec/0x1d0 | bpf_map_free_deferred+0x58/0x80 | process_one_work+0x260/0x5e0 | worker_thread+0x4d/0x3e0 | kthread+0x108/0x140 | ? process_one_work+0x5e0/0x5e0 | ? kthread_park+0x90/0x90 | ret_from_fork+0x3a/0x50 Fixes: 7e81a353 ("bpf: Sockmap, ensure sock lock held during tear down") Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200206111652.694507-2-jakub@cloudflare.com
-
由 Lorenz Bauer 提交于
It's currently possible to insert sockets in unexpected states into a sockmap, due to a TOCTTOU when updating the map from a syscall. sock_map_update_elem checks that sk->sk_state == TCP_ESTABLISHED, locks the socket and then calls sock_map_update_common. At this point, the socket may have transitioned into another state, and the earlier assumptions don't hold anymore. Crucially, it's conceivable (though very unlikely) that a socket has become unhashed. This breaks the sockmap's assumption that it will get a callback via sk->sk_prot->unhash. Fix this by checking the (fixed) sk_type and sk_protocol without the lock, followed by a locked check of sk_state. Unfortunately it's not possible to push the check down into sock_(map|hash)_update_common, since BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB run before the socket has transitioned from TCP_SYN_RECV into TCP_ESTABLISHED. Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: NLorenz Bauer <lmb@cloudflare.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20200207103713.28175-1-lmb@cloudflare.com
-
由 Al Viro 提交于
The former contains nothing but a pointer to an array of the latter... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Sandeen 提交于
Unused now. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and now errorf() et.al. are never called with NULL fs_context, so we can get rid of conditional in those. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
fs_parse() analogue taking p_log instead of fs_context. fs_parse() turned into a wrapper, callers in ceph_common and rbd switched to __fs_parse(). As the result, fs_parse() never gets NULL fs_context and neither do fs_context-based logging primitives Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
When upcalling gssproxy, cache_head.expiry_time is set as a timeval, not seconds since boot. As such, RPC cache expiry logic will not clean expired objects created under auth.rpcsec.context cache. This has proven to cause kernel memory leaks on field. Using 64 bit variants of getboottime/timespec Expiration times have worked this way since 2010's c5b29f88 "sunrpc: use seconds since boot in expiry cache". The gssproxy code introduced in 2012 added gss_proxy_save_rsc and introduced the bug. That's a while for this to lurk, but it required a bit of an extreme case to make it obvious. Signed-off-by: NRoberto Bergantinos Corpas <rbergant@redhat.com> Cc: stable@vger.kernel.org Fixes: 030d794b "SUNRPC: Use gssproxy upcall for server..." Tested-By: NFrank Sorenson <sorenson@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Ido Schimmel 提交于
Drop monitor uses a work item that takes care of constructing and sending netlink notifications to user space. In case drop monitor never started to monitor, then the work item is uninitialized and not associated with a function. Therefore, a stop command from user space results in canceling an uninitialized work item which leads to the following warning [1]. Fix this by not processing a stop command if drop monitor is not currently monitoring. [1] [ 31.735402] ------------[ cut here ]------------ [ 31.736470] WARNING: CPU: 0 PID: 143 at kernel/workqueue.c:3032 __flush_work+0x89f/0x9f0 ... [ 31.738120] CPU: 0 PID: 143 Comm: dwdump Not tainted 5.5.0-custom-09491-g16d4077796b8 #727 [ 31.741968] RIP: 0010:__flush_work+0x89f/0x9f0 ... [ 31.760526] Call Trace: [ 31.771689] __cancel_work_timer+0x2a6/0x3b0 [ 31.776809] net_dm_cmd_trace+0x300/0xef0 [ 31.777549] genl_rcv_msg+0x5c6/0xd50 [ 31.781005] netlink_rcv_skb+0x13b/0x3a0 [ 31.784114] genl_rcv+0x29/0x40 [ 31.784720] netlink_unicast+0x49f/0x6a0 [ 31.787148] netlink_sendmsg+0x7cf/0xc80 [ 31.790426] ____sys_sendmsg+0x620/0x770 [ 31.793458] ___sys_sendmsg+0xfd/0x170 [ 31.802216] __sys_sendmsg+0xdf/0x1a0 [ 31.806195] do_syscall_64+0xa0/0x540 [ 31.806885] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 8e94c3bc ("drop_monitor: Allow user to start monitoring hardware drops") Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
__in6_dev_get(dev) called from inet6_set_link_af() can return NULL. The needed check has been recently removed, let's add it back. While do_setlink() does call validate_linkmsg() : ... err = validate_linkmsg(dev, tb); /* OK at this point */ ... It is possible that the following call happening before the ->set_link_af() removes IPv6 if MTU is less than 1280 : if (tb[IFLA_MTU]) { err = dev_set_mtu_ext(dev, nla_get_u32(tb[IFLA_MTU]), extack); if (err < 0) goto errout; status |= DO_SETLINK_MODIFIED; } ... if (tb[IFLA_AF_SPEC]) { ... err = af_ops->set_link_af(dev, af); ->inet6_set_link_af() // CRASH because idev is NULL Please note that IPv4 is immune to the bug since inet_set_link_af() does : struct in_device *in_dev = __in_dev_get_rcu(dev); if (!in_dev) return -EAFNOSUPPORT; This problem has been mentioned in commit cf7afbfe ("rtnl: make link af-specific updates atomic") changelog : This method is not fail proof, while it is currently sufficient to make set_link_af() inerrable and thus 100% atomic, the validation function method will not be able to detect all error scenarios in the future, there will likely always be errors depending on states which are f.e. not protected by rtnl_mutex and thus may change between validation and setting. IPv6: ADDRCONF(NETDEV_CHANGE): lo: link becomes ready general protection fault, probably for non-canonical address 0xdffffc0000000056: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000002b0-0x00000000000002b7] CPU: 0 PID: 9698 Comm: syz-executor712 Not tainted 5.5.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733 Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00 RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6 RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0 RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0 R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000 FS: 0000000000c46880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f0494ca0d0 CR3: 000000009e4ac000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: do_setlink+0x2a9f/0x3720 net/core/rtnetlink.c:2754 rtnl_group_changelink net/core/rtnetlink.c:3103 [inline] __rtnl_newlink+0xdd1/0x1790 net/core/rtnetlink.c:3257 rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3377 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __do_sys_sendmsg net/socket.c:2439 [inline] __se_sys_sendmsg net/socket.c:2437 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4402e9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffd62fbcf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004402e9 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8 R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000401b70 R13: 0000000000401c00 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace cfa7664b8fdcdff3 ]--- RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733 Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00 RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6 RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0 RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0 R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000 FS: 0000000000c46880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000004 CR3: 000000009e4ac000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 7dc2bcca ("Validate required parameters in inet6_validate_link_af") Signed-off-by: NEric Dumazet <edumazet@google.com> Bisected-and-reported-by: Nsyzbot <syzkaller@googlegroups.com> Cc: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 2月, 2020 4 次提交
-
-
由 Markus Theil 提交于
This is now a trivial patch, but for seeing the actual changes I (Johannes) split it out from the original. Signed-off-by: NMarkus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200115125522.3755-1-markus.theil@tu-ilmenau.de [split into separate cfg80211/mac80211 patches] Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Markus Theil 提交于
When using control port over nl80211 in AP mode with pre-authentication, APs need to forward frames to other APs defined by their MAC address. Before this patch, pre-auth frames reaching user space over nl80211 control port have no longer any information about the dest attached, which can be used for forwarding to a controller or injecting the frame back to a ethernet interface over a AF_PACKET socket. Analog problems exist, when forwarding pre-auth frames from AP -> STA. This patch therefore adds the NL80211_ATTR_DST_MAC and NL80211_ATTR_SRC_MAC attributes to provide more context information when forwarding. The respective arguments are optional on tx and included on rx. Therefore unaware existing software is not affected. Software which wants to detect this feature, can do so by checking against: NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211_MAC_ADDRS Signed-off-by: NMarkus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200115125522.3755-1-markus.theil@tu-ilmenau.de [split into separate cfg80211/mac80211 patches] Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Shaul Triebitz 提交于
Parse also the RSN Extension IE when parsing the rest of the IEs. It will be used in a later patch. Signed-off-by: NShaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: NLuca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20200131111300.891737-21-luca@coelho.fiSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Ilan Peer 提交于
To support Pre Association Security Negotiation (PASN) while already associated to one AP, allow user space to register to Rx authentication frames, so that the user space logic would be able to receive/handle authentication frames from a different AP as part of PASN. Note that it is expected that user space would intelligently register for Rx authentication frames, i.e., only when PASN is used and configure a match filter only for PASN authentication algorithm, as otherwise the MLME functionality of mac80211 would be broken. Additionally, since some versions of the user space daemons wrongly register to all types of authentication frames (which might result in unexpected behavior) allow such registration if the request is for a specific authentication algorithm number. Signed-off-by: NIlan Peer <ilan.peer@intel.com> Signed-off-by: NLuca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20200131114529.894206-1-luca@coelho.fiSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-