- 25 11月, 2020 1 次提交
-
-
由 Björn Töpel 提交于
Commit 642e450b ("xsk: Do not discard packet when NETDEV_TX_BUSY") addressed the problem that packets were discarded from the Tx AF_XDP ring, when the driver returned NETDEV_TX_BUSY. Part of the fix was bumping the skbuff reference count, so that the buffer would not be freed by dev_direct_xmit(). A reference count larger than one means that the skbuff is "shared", which is not the case. If the "shared" skbuff is sent to the generic XDP receive path, netif_receive_generic_xdp(), and pskb_expand_head() is entered the BUG_ON(skb_shared(skb)) will trigger. This patch adds a variant to dev_direct_xmit(), __dev_direct_xmit(), where a user can select the skbuff free policy. This allows AF_XDP to avoid bumping the reference count, but still keep the NETDEV_TX_BUSY behavior. Fixes: 642e450b ("xsk: Do not discard packet when NETDEV_TX_BUSY") Reported-by: NYonghong Song <yhs@fb.com> Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201123175600.146255-1-bjorn.topel@gmail.com
-
- 23 11月, 2020 1 次提交
-
-
由 Marek Majtyka 提交于
Fix incorrect netdev reference count in xsk_bind operation. Incorrect reference count of the device appears when a user calls bind with the XDP_ZEROCOPY flag on an interface which does not support zero-copy. In such a case, an error is returned but the reference count is not decreased. This change fixes the fault, by decreasing the reference count in case of such an error. The problem being corrected appeared in '162c820e' for the first time, and the code was moved to new file location over the time with commit 'c2d3d6a4'. This specific patch applies to all version starting from 'c2d3d6a4'. The same solution should be applied but on different file (net/xdp/xdp_umem.c) and function (xdp_umem_assign_dev) for versions from '162c820e' to 'c2d3d6a4' excluded. Fixes: 162c820e ("xdp: hold device for umem regardless of zero-copy mode") Signed-off-by: NMarek Majtyka <marekx.majtyka@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NMagnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201120151443.105903-1-marekx.majtyka@intel.com
-
- 20 11月, 2020 4 次提交
-
-
由 Magnus Karlsson 提交于
Fix a bug that is triggered when a partially setup socket is destroyed. For a fully setup socket, a socket that has been bound to a device, the cleanup of the umem is performed at the end of the buffer pool's cleanup work queue item. This has to be performed in a work queue, and not in RCU cleanup, as it is doing a vunmap that cannot execute in interrupt context. However, when a socket has only been partially set up so that a umem has been created but the buffer pool has not, the code erroneously directly calls the umem cleanup function instead of using a work queue, and this leads to a BUG_ON() in vunmap(). As there in this case is no buffer pool, we cannot use its work queue, so we need to introduce a work queue for the umem and schedule this for the cleanup. So in the case there is no pool, we are going to use the umem's own work queue to schedule the cleanup. But if there is a pool, the cleanup of the umem is still being performed by the pool's work queue, as it is important that the umem is cleaned up after the pool. Fixes: e5e1a4bc ("xsk: Fix possible memory leak at socket close") Reported-by: NMarek Majtyka <marekx.majtyka@intel.com> Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Tested-by: NMarek Majtyka <marekx.majtyka@intel.com> Link: https://lore.kernel.org/bpf/1605873219-21629-1-git-send-email-magnus.karlsson@gmail.com
-
由 Karsten Graul 提交于
Sparse complaints 3 times about: net/smc/smc_ib.c:203:52: warning: incorrect type in argument 1 (different address spaces) net/smc/smc_ib.c:203:52: expected struct net_device const *dev net/smc/smc_ib.c:203:52: got struct net_device [noderef] __rcu *const ndev Fix that by using the existing and validated ndev variable instead of accessing attr->ndev directly. Fixes: 5102eca9 ("net/smc: Use rdma_read_gid_l2_fields to L2 fields") Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Karsten Graul 提交于
With the multi-subnet support of SMC-Dv2 the match for existing link groups should not include the vlanid of the network device. Set ini->smcd_version accordingly before the call to smc_conn_create() and use this value in smc_conn_create() to skip the vlanid check. Fixes: 5c21c4cc ("net/smc: determine accepted ISM devices") Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Georg Kohmann 提交于
IPV6=m NF_DEFRAG_IPV6=y ld: net/ipv6/netfilter/nf_conntrack_reasm.o: in function `nf_ct_frag6_gather': net/ipv6/netfilter/nf_conntrack_reasm.c:462: undefined reference to `ipv6_frag_thdr_truncated' Netfilter is depending on ipv6 symbol ipv6_frag_thdr_truncated. This dependency is forcing IPV6=y. Remove this dependency by moving ipv6_frag_thdr_truncated out of ipv6. This is the same solution as used with a similar issues: Referring to commit 70b095c8 ("ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module") Fixes: 9d9e937b ("ipv6/netfilter: Discard first fragment not including all headers") Reported-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NGeorg Kohmann <geokohma@cisco.com> Acked-by: NPablo Neira Ayuso <pablo@netfilter.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20201119095833.8409-1-geokohma@cisco.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 19 11月, 2020 2 次提交
-
-
由 Florian Fainelli 提交于
DSA network devices rely on having their DSA management interface up and running otherwise their ndo_open() will return -ENETDOWN. Without doing this it would not be possible to use DSA devices as netconsole when configured on the command line. These devices also do not utilize the upper/lower linking so the check about the netpoll device having upper is not going to be a problem. The solution adopted here is identical to the one done for net/ipv4/ipconfig.c with 728c0208 ("net: ipv4: handle DSA enabled master network devices"), with the network namespace scope being restricted to that of the process configuring netpoll. Fixes: 04ff53f9 ("net: dsa: Add netconsole support") Tested-by: NVladimir Oltean <olteanv@gmail.com> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201117035236.22658-1-f.fainelli@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Zhang Changzhong 提交于
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1605581105-35295-1-git-send-email-zhangchangzhong@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 18 11月, 2020 10 次提交
-
-
由 Florian Klink 提交于
Checking for ifdef CONFIG_x fails if CONFIG_x=m. Use IS_ENABLED instead, which is true for both built-ins and modules. Otherwise, a > ip -4 route add 1.2.3.4/32 via inet6 fe80::2 dev eth1 fails with the message "Error: IPv6 support not enabled in kernel." if CONFIG_IPV6 is `m`. In the spirit of b8127113. Fixes: d1566268 ("ipv4: Allow ipv6 gateway with ipv4 routes") Cc: Kim Phillips <kim.phillips@arm.com> Signed-off-by: NFlorian Klink <flokli@flokli.de> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20201115224509.2020651-1-flokli@flokli.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Wang Hai 提交于
nlmsg_cancel() needs to be called in the error path of inet_req_diag_fill to cancel the message. Fixes: d545caca ("net: inet: diag: expose the socket mark to privileged processes.") Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NWang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201116082018.16496-1-wanghai38@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 John Fastabend 提交于
When skb has a frag_list its possible for skb_to_sgvec() to fail. This happens when the scatterlist has fewer elements to store pages than would be needed for the initial skb plus any of its frags. This case appears rare, but is possible when running an RX parser/verdict programs exposed to the internet. Currently, when this happens we throw an error, break the pipe, and kfree the msg. This effectively breaks the application or forces it to do a retry. Lets catch this case and handle it by doing an skb_linearize() on any skb we receive with frags. At this point skb_to_sgvec should not fail because the failing conditions would require frags to be in place. Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556576837.73229.14800682790808797635.stgit@john-XPS-13-9370
-
由 John Fastabend 提交于
If the skb_verdict_prog redirects an skb knowingly to itself, fix your BPF program this is not optimal and an abuse of the API please use SK_PASS. That said there may be cases, such as socket load balancing, where picking the socket is hashed based or otherwise picks the same socket it was received on in some rare cases. If this happens we don't want to confuse userspace giving them an EAGAIN error if we can avoid it. To avoid double accounting in these cases. At the moment even if the skb has already been charged against the sockets rcvbuf and forward alloc we check it again and do set_owner_r() causing it to be orphaned and recharged. For one this is useless work, but more importantly we can have a case where the skb could be put on the ingress queue, but because we are under memory pressure we return EAGAIN. The trouble here is the skb has already been accounted for so any rcvbuf checks include the memory associated with the packet already. This rolls up and can result in unnecessary EAGAIN errors in userspace read() calls. Fix by doing an unlikely check and skipping checks if skb->sk == sk. Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556574804.73229.11328201020039674147.stgit@john-XPS-13-9370
-
由 John Fastabend 提交于
If a socket redirects to itself and it is under memory pressure it is possible to get a socket stuck so that recv() returns EAGAIN and the socket can not advance for some time. This happens because when redirecting a skb to the same socket we received the skb on we first check if it is OK to enqueue the skb on the receiving socket by checking memory limits. But, if the skb is itself the object holding the memory needed to enqueue the skb we will keep retrying from kernel side and always fail with EAGAIN. Then userspace will get a recv() EAGAIN error if there are no skbs in the psock ingress queue. This will continue until either some skbs get kfree'd causing the memory pressure to reduce far enough that we can enqueue the pending packet or the socket is destroyed. In some cases its possible to get a socket stuck for a noticeable amount of time if the socket is only receiving skbs from sk_skb verdict programs. To reproduce I make the socket memory limits ridiculously low so sockets are always under memory pressure. More often though if under memory pressure it looks like a spurious EAGAIN error on user space side causing userspace to retry and typically enough has moved on the memory side that it works. To fix skip memory checks and skb_orphan if receiving on the same sock as already assigned. For SK_PASS cases this is easy, its always the same socket so we can just omit the orphan/set_owner pair. For backlog cases we need to check skb->sk and decide if the orphan and set_owner pair are needed. Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556572660.73229.12566203819812939627.stgit@john-XPS-13-9370
-
由 John Fastabend 提交于
We use skb->size with sk_rmem_scheduled() which is not correct. Instead use truesize to align with socket and tcp stack usage of sk_rmem_schedule. Suggested-by: NDaniel Borkman <daniel@iogearbox.net> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556570616.73229.17003722112077507863.stgit@john-XPS-13-9370
-
由 John Fastabend 提交于
Fix sockmap sk_skb programs so that they observe sk_rcvbuf limits. This allows users to tune SO_RCVBUF and sockmap will honor them. We can refactor the if(charge) case out in later patches. But, keep this fix to the point. Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Suggested-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556568657.73229.8404601585878439060.stgit@john-XPS-13-9370
-
由 John Fastabend 提交于
If copy_page_to_iter() fails or even partially completes, but with fewer bytes copied than expected we currently reset sg.start and return EFAULT. This proves problematic if we already copied data into the user buffer before we return an error. Because we leave the copied data in the user buffer and fail to unwind the scatterlist so kernel side believes data has been copied and user side believes data has _not_ been received. Expected behavior should be to return number of bytes copied and then on the next read we need to return the error assuming its still there. This can happen if we have a copy length spanning multiple scatterlist elements and one or more complete before the error is hit. The error is rare enough though that my normal testing with server side programs, such as nginx, httpd, envoy, etc., I have never seen this. The only reliable way to reproduce that I've found is to stream movies over my browser for a day or so and wait for it to hang. Not very scientific, but with a few extra WARN_ON()s in the code the bug was obvious. When we review the errors from copy_page_to_iter() it seems we are hitting a page fault from copy_page_to_iter_iovec() where the code checks fault_in_pages_writeable(buf, copy) where buf is the user buffer. It also seems typical server applications don't hit this case. The other way to try and reproduce this is run the sockmap selftest tool test_sockmap with data verification enabled, but it doesn't reproduce the fault. Perhaps we can trigger this case artificially somehow from the test tools. I haven't sorted out a way to do that yet though. Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556566659.73229.15694973114605301063.stgit@john-XPS-13-9370
-
由 Tariq Toukan 提交于
In async_resync mode, we log the TCP seq of records until the async request is completed. Later, in case one of the logged seqs matches the resync request, we return it, together with its record serial number. Before this fix, we mistakenly returned the serial number of the current record instead. Fixes: ed9b7646 ("net/tls: Add asynchronous resync") Signed-off-by: NTariq Toukan <tariqt@nvidia.com> Reviewed-by: NBoris Pismenny <borisp@nvidia.com> Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Ryan Sharpelletti 提交于
During loss recovery, retransmitted packets are forced to use TCP timestamps to calculate the RTT samples, which have a millisecond granularity. BBR is designed using a microsecond granularity. As a result, multiple RTT samples could be truncated to the same RTT value during loss recovery. This is problematic, as BBR will not enter PROBE_RTT if the RTT sample is <= the current min_rtt sample, meaning that if there are persistent losses, PROBE_RTT will constantly be pushed off and potentially never re-entered. This patch makes sure that BBR enters PROBE_RTT by checking if RTT sample is < the current min_rtt sample, rather than <=. The Netflix transport/TCP team discovered this bug in the Linux TCP BBR code during lab tests. Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Signed-off-by: NRyan Sharpelletti <sharpelletti@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Link: https://lore.kernel.org/r/20201116174412.1433277-1-sharpelletti.kdev@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 17 11月, 2020 3 次提交
-
-
由 Vadim Fedorenko 提交于
If tcp socket has more data than Encrypted Handshake Message then tls_sw_recvmsg will try to decrypt next record instead of returning full control message to userspace as mentioned in comment. The next message - usually Application Data - gets corrupted because it uses zero copy for decryption that's why the data is not stored in skb for next iteration. Revert check to not decrypt next record if current is not Application Data. Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: NVadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1605413760-21153-1-git-send-email-vfedorenko@novek.ruSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Heiner Kallweit 提交于
In br_forward.c and br_input.c fields dev->stats.tx_dropped and dev->stats.multicast are populated, but they are ignored in ndo_get_stats64. Fixes: 28172739 ("net: fix 64 bit counters on 32 bit arches") Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/58ea9963-77ad-a7cf-8dfd-fc95ab95f606@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Georg Kohmann 提交于
Packets are processed even though the first fragment don't include all headers through the upper layer header. This breaks TAHI IPv6 Core Conformance Test v6LC.1.3.6. Referring to RFC8200 SECTION 4.5: "If the first fragment does not include all headers through an Upper-Layer header, then that fragment should be discarded and an ICMP Parameter Problem, Code 3, message should be sent to the source of the fragment, with the Pointer field set to zero." The fragment needs to be validated the same way it is done in commit 2efdaaaf ("IPv6: reply ICMP error if the first fragment don't include all headers") for ipv6. Wrap the validation into a common function, ipv6_frag_thdr_truncated() to check for truncation in the upper layer header. This validation does not fullfill all aspects of RFC 8200, section 4.5, but is at the moment sufficient to pass mentioned TAHI test. In netfilter, utilize the fragment offset returned by find_prev_fhdr() to let ipv6_frag_thdr_truncated() start it's traverse from the fragment header. Return 0 to drop the fragment in the netfilter. This is the same behaviour as used on other protocol errors in this function, e.g. when nf_ct_frag6_queue() returns -EPROTO. The Fragment will later be picked up by ipv6_frag_rcv() in reassembly.c. ipv6_frag_rcv() will then send an appropriate ICMP Parameter Problem message back to the source. References commit 2efdaaaf ("IPv6: reply ICMP error if the first fragment don't include all headers") Signed-off-by: NGeorg Kohmann <geokohma@cisco.com> Acked-by: NPablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/20201111115025.28879-1-geokohma@cisco.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 16 11月, 2020 2 次提交
-
-
由 Anant Thazhemadam 提交于
In canfd_rcv(), cfd->len is uninitialized when skb->len = 0, and this uninitialized cfd->len is accessed nonetheless by pr_warn_once(). Fix this uninitialized variable access by checking cfd->len's validity condition (cfd->len > CANFD_MAX_DLEN) separately after the skb->len's condition is checked, and appropriately modify the log messages that are generated as well. In case either of the required conditions fail, the skb is freed and NET_RX_DROP is returned, same as before. Fixes: d4689846 ("can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once") Reported-by: syzbot+9bcb0c9409066696d3aa@syzkaller.appspotmail.com Tested-by: NAnant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: NAnant Thazhemadam <anant.thazhemadam@gmail.com> Link: https://lore.kernel.org/r/20201103213906.24219-3-anant.thazhemadam@gmail.comSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
-
由 Anant Thazhemadam 提交于
In can_rcv(), cfd->len is uninitialized when skb->len = 0, and this uninitialized cfd->len is accessed nonetheless by pr_warn_once(). Fix this uninitialized variable access by checking cfd->len's validity condition (cfd->len > CAN_MAX_DLEN) separately after the skb->len's condition is checked, and appropriately modify the log messages that are generated as well. In case either of the required conditions fail, the skb is freed and NET_RX_DROP is returned, same as before. Fixes: 8cb68751 ("can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once") Reported-by: syzbot+9bcb0c9409066696d3aa@syzkaller.appspotmail.com Tested-by: NAnant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: NAnant Thazhemadam <anant.thazhemadam@gmail.com> Link: https://lore.kernel.org/r/20201103213906.24219-2-anant.thazhemadam@gmail.comSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
-
- 15 11月, 2020 4 次提交
-
-
由 Wang Hai 提交于
If sb_occ_port_pool_get() failed in devlink_nl_sb_port_pool_fill(), msg should be canceled by genlmsg_cancel(). Fixes: df38dafd ("devlink: implement shared buffer occupancy monitoring interface") Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NWang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201113111622.11040-1-wanghai38@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paul Moore 提交于
Static checking revealed that a previous fix to netlbl_unlabel_staticlist() leaves a stack variable uninitialized, this patches fixes that. Fixes: 866358ec ("netlabel: fix our progress tracking in netlbl_unlabel_staticlist()") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NPaul Moore <paul@paul-moore.com> Reviewed-by: NJames Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/160530304068.15651.18355773009751195447.stgit@siflSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Xin Long 提交于
A call trace was found in Hangbin's Codenomicon testing with debug kernel: [ 2615.981988] ODEBUG: free active (active state 0) object type: timer_list hint: sctp_generate_proto_unreach_event+0x0/0x3a0 [sctp] [ 2615.995050] WARNING: CPU: 17 PID: 0 at lib/debugobjects.c:328 debug_print_object+0x199/0x2b0 [ 2616.095934] RIP: 0010:debug_print_object+0x199/0x2b0 [ 2616.191533] Call Trace: [ 2616.194265] <IRQ> [ 2616.202068] debug_check_no_obj_freed+0x25e/0x3f0 [ 2616.207336] slab_free_freelist_hook+0xeb/0x140 [ 2616.220971] kfree+0xd6/0x2c0 [ 2616.224293] rcu_do_batch+0x3bd/0xc70 [ 2616.243096] rcu_core+0x8b9/0xd00 [ 2616.256065] __do_softirq+0x23d/0xacd [ 2616.260166] irq_exit+0x236/0x2a0 [ 2616.263879] smp_apic_timer_interrupt+0x18d/0x620 [ 2616.269138] apic_timer_interrupt+0xf/0x20 [ 2616.273711] </IRQ> This is because it holds asoc when transport->proto_unreach_timer starts and puts asoc when the timer stops, and without holding transport the transport could be freed when the timer is still running. So fix it by holding/putting transport instead for proto_unreach_timer in transport, just like other timers in transport. v1->v2: - Also use sctp_transport_put() for the "out_unlock:" path in sctp_generate_proto_unreach_event(), as Marcelo noticed. Fixes: 50b5d6ad ("sctp: Fix a race between ICMP protocol unreachable and connect()") Reported-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/102788809b554958b13b95d33440f5448113b8d6.1605331373.git.lucien.xin@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Stefano Garzarella 提交于
Before commit c0cfa2d8 ("vsock: add multi-transports support"), if a G2H transport was loaded (e.g. virtio transport), every packets was forwarded to the host, regardless of the destination CID. The H2G transports implemented until then (vhost-vsock, VMCI) always responded with an error, if the destination CID was not VMADDR_CID_HOST. From that commit, we are using the remote CID to decide which transport to use, so packets with remote CID > VMADDR_CID_HOST(2) are sent only through H2G transport. If no H2G is available, packets are discarded directly in the guest. Some use cases (e.g. Nitro Enclaves [1]) rely on the old behaviour to implement sibling VMs communication, so we restore the old behavior when no H2G is registered. It will be up to the host to discard packets if the destination is not the right one. As it was already implemented before adding multi-transport support. Tested with nested QEMU/KVM by me and Nitro Enclaves by Andra. [1] Documentation/virt/ne_overview.rst Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Dexuan Cui <decui@microsoft.com> Fixes: c0cfa2d8 ("vsock: add multi-transports support") Reported-by: NAndra Paraschiv <andraprs@amazon.com> Tested-by: NAndra Paraschiv <andraprs@amazon.com> Signed-off-by: NStefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201112133837.34183-1-sgarzare@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 14 11月, 2020 2 次提交
-
-
由 Zhang Qilong 提交于
genlmsg_cancel() needs to be called in the error path of inet6_fill_ifmcaddr and inet6_fill_ifacaddr to cancel the message. Fixes: 6ecf4c37 ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR") Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NZhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20201112080950.1476302-1-zhangqilong3@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jeff Dike 提交于
Commit 58956317 ("neighbor: Improve garbage collection") guarantees neighbour table entries a five-second lifetime. Processes which make heavy use of multicast can fill the neighour table with multicast addresses in five seconds. At that point, neighbour entries can't be GC-ed because they aren't five seconds old yet, the kernel log starts to fill up with "neighbor table overflow!" messages, and sends start to fail. This patch allows multicast addresses to be thrown out before they've lived out their five seconds. This makes room for non-multicast addresses and makes messages to all addresses more reliable in these circumstances. Fixes: 58956317 ("neighbor: Improve garbage collection") Signed-off-by: NJeff Dike <jdike@akamai.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20201113015815.31397-1-jdike@akamai.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 13 11月, 2020 6 次提交
-
-
由 Johannes Berg 提交于
If sta_info_insert_finish() fails, we currently keep the station around and free it only in the caller, but there's only one such caller and it always frees it immediately. As syzbot found, another consequence of this split is that we can put things that sleep only into __cleanup_single_sta() and not in sta_info_free(), but this is the only place that requires such of sta_info_free() now. Change this to free the station in sta_info_insert_finish(), in which case we can still sleep. This will also let us unify the cleanup code later. Cc: stable@vger.kernel.org Fixes: dcd479e1 ("mac80211: always wind down STA state") Reported-by: syzbot+32c6c38c4812d22f2f0b@syzkaller.appspotmail.com Reported-by: syzbot+4c81fe92e372d26c4246@syzkaller.appspotmail.com Reported-by: syzbot+6a7fe9faf0d1d61bc24a@syzkaller.appspotmail.com Reported-by: syzbot+abed06851c5ffe010921@syzkaller.appspotmail.com Reported-by: syzbot+b7aeb9318541a1c709f1@syzkaller.appspotmail.com Reported-by: syzbot+d5a9416c6cafe53b5dd0@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20201112112201.ee6b397b9453.I9c31d667a0ea2151441cc64ed6613d36c18a48e0@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Xie He 提交于
The x25_disconnect function in x25_subr.c would decrease the refcount of "x25->neighbour" (struct x25_neigh) and reset this pointer to NULL. However, the x25_rx_call_request function in af_x25.c, which is called when we receive a connection request, does not increase the refcount when it assigns the pointer. Fix this issue by increasing the refcount of "struct x25_neigh" in x25_rx_call_request. This patch fixes frequent kernel crashes when using AF_X25 sockets. Fixes: 4becb7ee ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: NXie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20201112103506.5875-1-xie.he.0141@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Joel Stanley 提交于
If a user unbinds and re-binds a NC-SI aware driver the kernel will attempt to register the netlink interface at runtime. The structure is marked __ro_after_init so registration fails spectacularly at this point. # echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/unbind # echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/bind ftgmac100 1e660000.ethernet: Read MAC address 52:54:00:12:34:56 from chip ftgmac100 1e660000.ethernet: Using NCSI interface 8<--- cut here --- Unable to handle kernel paging request at virtual address 80a8f858 pgd = 8c768dd6 [80a8f858] *pgd=80a0841e(bad) Internal error: Oops: 80d [#1] SMP ARM CPU: 0 PID: 116 Comm: sh Not tainted 5.10.0-rc3-next-20201111-00003-gdd25b227ec1e #51 Hardware name: Generic DT based system PC is at genl_register_family+0x1f8/0x6d4 LR is at 0xff26ffff pc : [<8073f930>] lr : [<ff26ffff>] psr: 20000153 sp : 8553bc80 ip : 81406244 fp : 8553bd04 r10: 8085d12c r9 : 80a8f73c r8 : 85739000 r7 : 00000017 r6 : 80a8f860 r5 : 80c8ab98 r4 : 80a8f858 r3 : 00000000 r2 : 00000000 r1 : 81406130 r0 : 00000017 Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none Control: 00c5387d Table: 85524008 DAC: 00000051 Process sh (pid: 116, stack limit = 0x1f1988d6) ... Backtrace: [<8073f738>] (genl_register_family) from [<80860ac0>] (ncsi_init_netlink+0x20/0x48) r10:8085d12c r9:80c8fb0c r8:85739000 r7:00000000 r6:81218000 r5:85739000 r4:8121c000 [<80860aa0>] (ncsi_init_netlink) from [<8085d740>] (ncsi_register_dev+0x1b0/0x210) r5:8121c400 r4:8121c000 [<8085d590>] (ncsi_register_dev) from [<805a8060>] (ftgmac100_probe+0x6e0/0x778) r10:00000004 r9:80950228 r8:8115bc10 r7:8115ab00 r6:9eae2c24 r5:813b6f88 r4:85739000 [<805a7980>] (ftgmac100_probe) from [<805355ec>] (platform_drv_probe+0x58/0xa8) r9:80c76bb0 r8:00000000 r7:80cd4974 r6:80c76bb0 r5:8115bc10 r4:00000000 [<80535594>] (platform_drv_probe) from [<80532d58>] (really_probe+0x204/0x514) r7:80cd4974 r6:00000000 r5:80cd4868 r4:8115bc10 Jakub pointed out that ncsi_register_dev is obviously broken, because there is only one family so it would never work if there was more than one ncsi netdev. Fix the crash by registering the netlink family once on boot, and drop the code to unregister it. Fixes: 955dc68c ("net/ncsi: Add generic netlink family") Signed-off-by: NJoel Stanley <joel@jms.id.au> Reviewed-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com> Link: https://lore.kernel.org/r/20201112061210.914621-1-joel@jms.id.auSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alexander Lobakin 提交于
udp{4,6}_lib_lookup_skb() use ip{,v6}_hdr() to get IP header of the packet. While it's probably OK for non-frag0 paths, this helpers will also point to junk on Fast/frag0 GRO when all headers are located in frags. As a result, sk/skb lookup may fail or give wrong results. To support both GRO modes, skb_gro_network_header() might be used. To not modify original functions, add private versions of udp{4,6}_lib_lookup_skb() only to perform correct sk lookups on GRO. Present since the introduction of "application-level" UDP GRO in 4.7-rc1. Misc: replace totally unneeded ternaries with plain ifs. Fixes: a6024562 ("udp: Add GRO functions to UDP socket") Suggested-by: NWillem de Bruijn <willemb@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alexander Lobakin 提交于
UDP GRO uses udp_hdr(skb) in its .gro_receive() callback. While it's probably OK for non-frag0 paths (when all headers or even the entire frame are already in skb head), this inline points to junk when using Fast GRO (napi_gro_frags() or napi_gro_receive() with only Ethernet header in skb head and all the rest in the frags) and breaks GRO packet compilation and the packet flow itself. To support both modes, skb_gro_header_fast() + skb_gro_header_slow() are typically used. UDP even has an inline helper that makes use of them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr() to get rid of the out-of-order delivers. Present since the introduction of plain UDP GRO in 5.0-rc1. Fixes: e20cf8d3 ("udp: implement GRO for plain UDP sockets.") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Parav Pandit 提交于
Cited commit in fixes tag overwrites the port attributes for the registered port. Avoid such error by checking registered flag before setting attributes. Fixes: 71ad8d55 ("devlink: Replace devlink_port_attrs_set parameters with a struct") Signed-off-by: NParav Pandit <parav@nvidia.com> Reviewed-by: NJiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20201111034744.35554-1-parav@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 12 11月, 2020 5 次提交
-
-
由 Felix Fietkau 提交于
Some drivers fill the status rate list without setting the rate index after the final rate to -1. minstrel_ht already deals with this, but minstrel doesn't, which causes it to get stuck at the lowest rate on these drivers. Fix this by checking the count as well. Cc: stable@vger.kernel.org Fixes: cccf129f ("mac80211: add the 'minstrel' rate control algorithm") Signed-off-by: NFelix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201111183359.43528-3-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Felix Fietkau 提交于
Deferring sampling attempts to the second stage has some bad interactions with drivers that process the rate table in hardware and use the probe flag to indicate probing packets (e.g. most mt76 drivers). On affected drivers it can lead to probing not working at all. If the link conditions turn worse, it might not be such a good idea to do a lot of sampling for lower rates in this case. Fix this by simply skipping the sample attempt instead of deferring it, but keep the checks that would allow it to be sampled if it was skipped too often, but only if it has less than 95% success probability. Also ensure that IEEE80211_TX_CTL_RATE_CTRL_PROBE is set for all probing packets. Cc: stable@vger.kernel.org Fixes: cccf129f ("mac80211: add the 'minstrel' rate control algorithm") Signed-off-by: NFelix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201111183359.43528-2-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Felix Fietkau 提交于
After the status rework, ieee80211_tx_status_ext is leaking un-acknowledged packets for stations in powersave mode. To fix this, move the code handling those packets from __ieee80211_tx_status into ieee80211_tx_status_ext Reported-by: NTobias Waldvogel <tobias.waldvogel@gmail.com> Fixes: 3318111c ("mac80211: reduce duplication in tx status functions") Signed-off-by: NFelix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201111183359.43528-1-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Claire Chang 提交于
If a device is getting removed or reprobed during resume, use-after-free might happen. For example, h5_btrtl_resume() schedules a work queue for device reprobing, which of course requires removal first. If the removal happens in parallel with the device_resume() and wins the race to acquire device_lock(), removal may remove the device from the PM lists and all, but device_resume() is already running and will continue when the lock can be acquired, thus calling rfkill_resume(). During this, if rfkill_set_block() is then called after the corresponding *_unregister() and kfree() are called, there will be an use-after-free in hci_rfkill_set_block(): BUG: KASAN: use-after-free in hci_rfkill_set_block+0x58/0xc0 [bluetooth] ... Call trace: dump_backtrace+0x0/0x154 show_stack+0x20/0x2c dump_stack+0xbc/0x12c print_address_description+0x88/0x4b0 __kasan_report+0x144/0x168 kasan_report+0x10/0x18 check_memory_region+0x19c/0x1ac __kasan_check_write+0x18/0x24 hci_rfkill_set_block+0x58/0xc0 [bluetooth] rfkill_set_block+0x9c/0x120 rfkill_resume+0x34/0x70 dpm_run_callback+0xf0/0x1f4 device_resume+0x210/0x22c Fix this by checking rfkill->registered in rfkill_resume(). device_del() in rfkill_unregister() requires device_lock() and the whole rfkill_resume() is also protected by the same lock via device_resume(), we can make sure either the rfkill->registered is false before rfkill_resume() starts or the rfkill device won't be unregistered before rfkill_resume() returns. As async_resume() holds a reference to the device, at this level there can be no use-after-free; only in the user that doesn't expect this scenario. Fixes: 8589086f ("Bluetooth: hci_h5: Turn off RTL8723BS on suspend, reprobe on resume") Signed-off-by: NClaire Chang <tientzu@chromium.org> Link: https://lore.kernel.org/r/20201110084908.219088-1-tientzu@chromium.org [edit commit message for clarity and add more info provided later] Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Martin Schiller 提交于
This fixes a regression for blocking connects introduced by commit 4becb7ee ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect"). The x25->neighbour is already set to "NULL" by x25_disconnect() now, while a blocking connect is waiting in x25_wait_for_connection_establishment(). Therefore x25->neighbour must not be accessed here again and x25->state is also already set to X25_STATE_0 by x25_disconnect(). Fixes: 4becb7ee ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Signed-off-by: NMartin Schiller <ms@dev.tdt.de> Reviewed-by: NXie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20201109065449.9014-1-ms@dev.tdt.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-