- 07 7月, 2018 3 次提交
-
-
由 Jon Maloy 提交于
The duplicate address discovery protocol is not safe against two discoverers running in parallel. The one executing first after the trial period is over will set the node address and change its own message type to DSC_REQ_MSG. The one executing last may find that the node address is already set, and never change message type, with the result that its links may never be established. In this commmit we ensure that the message type always is set correctly after the trial period is over. Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values") Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Maloy 提交于
With the duplicate address discovery protocol for tipc nodes addresses we introduced a one second trial period before a node is allocated a hash number to use as address. Unfortunately, we miss to handle the case when a regular LINK REQUEST/ RESPONSE arrives from a cluster node during the trial period. Such messages are not ignored as they should be, leading to links setup attempts while the node still has no address. Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values") Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Maloy 提交于
The function for checking if there is an node address conflict is supposed to return a suggestion for a new address if it finds a conflict, and zero otherwise. But in case the peer being checked is previously unknown it does instead return a "suggestion" for the checked address itself. This results in a DSC_TRIAL_FAIL_MSG being sent unecessarily to the peer, and sometimes makes the trial period starting over again. Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values") Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 7月, 2018 1 次提交
-
-
由 Tyler Hicks 提交于
The low and high values of the net.ipv4.ping_group_range sysctl were being silently forced to the default disabled state when a write to the sysctl contained GIDs that didn't map to the associated user namespace. Confusingly, the sysctl's write operation would return success and then a subsequent read of the sysctl would indicate that the low and high values are the overflowgid. This patch changes the behavior by clearly returning an error when the sysctl write operation receives a GID range that doesn't map to the associated user namespace. In such a situation, the previous value of the sysctl is preserved and that range will be returned in a subsequent read of the sysctl. Signed-off-by: NTyler Hicks <tyhicks@canonical.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 7月, 2018 3 次提交
-
-
由 Arun Kumar Neelakantam 提交于
All the control messages broadcast to remote routers are using QRTR_NODE_BCAST instead of using local router NODE ID which cause the packets to be dropped on remote router due to invalid NODE ID. Signed-off-by: NArun Kumar Neelakantam <aneela@codeaurora.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arun Kumar Neelakantam 提交于
The broadcast node id should only be sent with the control port id. Signed-off-by: NArun Kumar Neelakantam <aneela@codeaurora.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul Moore 提交于
At present the ipv6_renew_options_kern() function ends up calling into access_ok() which is problematic if done from inside an interrupt as access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures (x86-64 is affected). Example warning/backtrace is shown below: WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90 ... Call Trace: <IRQ> ipv6_renew_option+0xb2/0xf0 ipv6_renew_options+0x26a/0x340 ipv6_renew_options_kern+0x2c/0x40 calipso_req_setattr+0x72/0xe0 netlbl_req_setattr+0x126/0x1b0 selinux_netlbl_inet_conn_request+0x80/0x100 selinux_inet_conn_request+0x6d/0xb0 security_inet_conn_request+0x32/0x50 tcp_conn_request+0x35f/0xe00 ? __lock_acquire+0x250/0x16c0 ? selinux_socket_sock_rcv_skb+0x1ae/0x210 ? tcp_rcv_state_process+0x289/0x106b tcp_rcv_state_process+0x289/0x106b ? tcp_v6_do_rcv+0x1a7/0x3c0 tcp_v6_do_rcv+0x1a7/0x3c0 tcp_v6_rcv+0xc82/0xcf0 ip6_input_finish+0x10d/0x690 ip6_input+0x45/0x1e0 ? ip6_rcv_finish+0x1d0/0x1d0 ipv6_rcv+0x32b/0x880 ? ip6_make_skb+0x1e0/0x1e0 __netif_receive_skb_core+0x6f2/0xdf0 ? process_backlog+0x85/0x250 ? process_backlog+0x85/0x250 ? process_backlog+0xec/0x250 process_backlog+0xec/0x250 net_rx_action+0x153/0x480 __do_softirq+0xd9/0x4f7 do_softirq_own_stack+0x2a/0x40 </IRQ> ... While not present in the backtrace, ipv6_renew_option() ends up calling access_ok() via the following chain: access_ok() _copy_from_user() copy_from_user() ipv6_renew_option() The fix presented in this patch is to perform the userspace copy earlier in the call chain such that it is only called when the option data is actually coming from userspace; that place is do_ipv6_setsockopt(). Not only does this solve the problem seen in the backtrace above, it also allows us to simplify the code quite a bit by removing ipv6_renew_options_kern() completely. We also take this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option() a small amount as well. This patch is heavily based on a rough patch by Al Viro. I've taken his original patch, converted a kmemdup() call in do_ipv6_setsockopt() to a memdup_user() call, made better use of the e_inval jump target in the same function, and cleaned up the use ipv6_renew_option() by ipv6_renew_options(). CC: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NPaul Moore <paul@paul-moore.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 7月, 2018 3 次提交
-
-
由 Xin Long 提交于
After commit b6c5734d ("sctp: fix the handling of ICMP Frag Needed for too small MTUs"), sctp_transport_update_pmtu would refetch pathmtu from the dst and set it to transport's pathmtu without any check. The new pathmtu may be lower than MINSEGMENT if the dst is obsolete and updated by .get_dst() in sctp_transport_update_pmtu. In this case, it could have a smaller MTU as well, and thus we should validate it against MINSEGMENT instead. Syzbot reported a warning in sctp_mtu_payload caused by this. This patch refetches the pathmtu by calling sctp_dst_mtu where it does the check against MINSEGMENT. v1->v2: - refetch the pathmtu by calling sctp_dst_mtu instead as Marcelo's suggestion. Fixes: b6c5734d ("sctp: fix the handling of ICMP Frag Needed for too small MTUs") Reported-by: syzbot+f0d9d7cba052f9344b03@syzkaller.appspotmail.com Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
NetworkManager likes to manage linklocal prefix routes and does so with the NLM_F_APPEND flag, breaking attempts to simplify the IPv6 route code and by extension enable multipath routes with device only nexthops. Revert f34436a4 and these followup patches: 6eba08c3 ("ipv6: Only emit append events for appended routes"). ce45bded ("mlxsw: spectrum_router: Align with new route replace logic") 53b562df ("mlxsw: spectrum_router: Allow appending to dev-only routes") Update the fib_tests cases to reflect the old behavior. Fixes: f34436a4 ("net/ipv6: Simplify route replace and appending into multipath route") Signed-off-by: NDavid Ahern <dsahern@gmail.com>
-
由 Toke Høiland-Jørgensen 提交于
The gen_stats facility will add a header for the toplevel nlattr of type TCA_STATS2 that contains all stats added by qdisc callbacks. A reference to this header is stored in the gnet_dump struct, and when all the per-qdisc callbacks have finished adding their stats, the length of the containing header will be adjusted to the right value. However, on architectures that need padding (i.e., that don't set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS), the padding nlattr is added before the stats, which means that the stored pointer will point to the padding, and so when the header is fixed up, the result is just a very big padding nlattr. Because most qdiscs also supply the legacy TCA_STATS struct, this problem has been mostly invisible, but we exposed it with the netlink attribute-based statistics in CAKE. Fix the issue by fixing up the stored pointer if it points to a padding nlattr. Tested-by: NPete Heist <pete@heistp.net> Tested-by: NKevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 7月, 2018 1 次提交
-
-
由 Doron Roberts-Kedes 提交于
The current code does not inspect the return value of skb_to_sgvec. This can cause a nullptr kernel panic when the malformed sgvec is passed into the crypto request. Checking the return value of skb_to_sgvec and skipping decryption if it is negative fixes this problem. Fixes: c46234eb ("tls: RX path for ktls") Acked-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 7月, 2018 2 次提交
-
-
由 Eric Biggers 提交于
The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags, not 'gfp_t'. So don't pass GFP_KERNEL to it. Fixes: bf355b8d ("ipv6: sr: add core files for SR HMAC support") Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
Since the addition of GRO for ESP, gro_receive can consume the skb and return -EINPROGRESS. In that case, the lower layer GRO handler cannot touch the skb anymore. Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted some of the gro_receive handlers that can lead to ESP's gro_receive so that they wouldn't access the skb when -EINPROGRESS is returned, but missed other spots, mainly in tunneling protocols. This patch finishes the conversion to using skb_gro_flush_final(), and adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and GUE. Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Reviewed-by: NStefano Brivio <sbrivio@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 7月, 2018 1 次提交
-
-
由 Ilpo Järvinen 提交于
If SACK is not enabled and the first cumulative ACK after the RTO retransmission covers more than the retransmitted skb, a spurious FRTO undo will trigger (assuming FRTO is enabled for that RTO). The reason is that any non-retransmitted segment acknowledged will set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is no indication that it would have been delivered for real (the scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK case so the check for that bit won't help like it does with SACK). Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo in tcp_process_loss. We need to use more strict condition for non-SACK case and check that none of the cumulatively ACKed segments were retransmitted to prove that progress is due to original transmissions. Only then keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in non-SACK case. (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS to better indicate its purpose but to keep this change minimal, it will be done in another patch). Besides burstiness and congestion control violations, this problem can result in RTO loop: When the loss recovery is prematurely undoed, only new data will be transmitted (if available) and the next retransmission can occur only after a new RTO which in case of multiple losses (that are not for consecutive packets) requires one RTO per loss to recover. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 6月, 2018 5 次提交
-
-
由 Roopa Prabhu 提交于
After commit f9d4b0c1 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find for existing rule lookup in both the add and del paths. While this is good for the delete path, it solves a few problems but opens up a few invalid key matches in the add path. $ip -4 rule add table main tos 10 fwmark 1 $ip -4 rule add table main tos 10 RTNETLINK answers: File exists The problem here is rule_find does not check if the key masks in the new and old rule are the same and hence ends up matching a more secific rule. Rule key masks cannot be easily compared today without an elaborate if-else block. Its best to introduce key masks for easier and accurate rule comparison in the future. Until then, due to fear of regressions this patch re-introduces older loose rule_exists during add. Also fixes both rule_exists and rule_find to cover missing attributes. Fixes: f9d4b0c1 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule") Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cong Wang 提交于
As noticed by Eric, we need to switch to the helper dev_change_tx_queue_len() for SIOCSIFTXQLEN call path too, otheriwse still miss dev_qdisc_change_tx_queue_len(). Fixes: 6a643ddb ("net: introduce helper dev_change_tx_queue_len()") Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Sowmini reported that a recent commit broke prefix routes for linklocal addresses. The newly added modify_prefix_route is attempting to add a new prefix route when the ifp priority does not match the route metric however the check needs to account for the default priority. In addition, the route add fails because the route already exists, and then the delete removes the one that exists. Flip the order to do the delete first. Fixes: 8308f3ff ("net/ipv6: Add support for specifying metric of connected routes") Reported-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Hocko 提交于
alloc_skb_with_frags uses __GFP_NORETRY for non-sleeping allocations which is just a noop and a little bit confusing. __GFP_NORETRY was added by ed98df33 ("net: use __GFP_NORETRY for high order allocations") to prevent from the OOM killer. Yet this was not enough because fb05e7a8 ("net: don't wait for order-3 page allocation") didn't want an excessive reclaim for non-costly orders so it made it completely NOWAIT while it preserved __GFP_NORETRY in place which is now redundant. Drop the pointless __GFP_NORETRY because this function is used as copy&paste source for other places. Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Fast Open key could be stored in different endian based on the CPU. Previously hosts in different endianness in a server farm using the same key config (sysctl value) would produce different cookies. This patch fixes it by always storing it as little endian to keep same API for LE hosts. Reported-by: NDaniele Iamartino <danielei@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 6月, 2018 7 次提交
-
-
由 Christoph Hellwig 提交于
The big aio poll revert broke various network protocols that don't implement ->poll as a patch in the aio poll serie removed sock_no_poll and made the common code handle this case. Reported-by: syzbot+57727883dbad76db2ef0@syzkaller.appspotmail.com Reported-by: syzbot+cdb0d3176b53d35ad454@syzkaller.appspotmail.com Reported-by: syzbot+2c7e8f74f8b2571c87e8@syzkaller.appspotmail.com Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Fixes: a11e1d43 ("Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL") Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Shakeel Butt 提交于
Currently the kernel accounts the memory for network traffic through mem_cgroup_[un]charge_skmem() interface. However the memory accounted only includes the truesize of sk_buff which does not include the size of sock objects. In our production environment, with opt-out kmem accounting, the sock kmem caches (TCP[v6], UDP[v6], RAW[v6], UNIX) are among the top most charged kmem caches and consume a significant amount of memory which can not be left as system overhead. So, this patch converts the kmem caches of all sock objects to SLAB_ACCOUNT. Signed-off-by: NShakeel Butt <shakeelb@google.com> Suggested-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Johannes Berg 提交于
At the very least we should check the return value if nla_parse_nested() is called with a non-NULL policy. Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Bob Copeland 提交于
Commit 9757235f, "nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value") relaxed the range for the HT operation field in meshconf, while also adding checks requiring the non-greenfield and non-ht-sta bits to be set in certain circumstances. The latter bit is actually reserved for mesh BSSes according to Table 9-168 in 802.11-2016, so in fact it should not be set. wpa_supplicant sets these bits because the mesh and AP code share the same implementation, but authsae does not. As a result, some meshconf updates from authsae which set only the NONHT_MIXED protection bits were being rejected. In order to avoid breaking userspace by changing the rules again, simply accept the values with or without the bits set, and mask off the reserved bit to match the spec. While in here, update the 802.11-2012 reference to 802.11-2016. Fixes: 9757235f ("nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value") Cc: Masashi Honma <masashi.honma@gmail.com> Signed-off-by: NBob Copeland <bobcopeland@fb.com> Reviewed-by: NMasashi Honma <masashi.honma@gmail.com> Reviewed-by: NMasashi Honma <masashi.honma@gmail.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Denis Kenzior 提交于
On pre-emption enabled kernels the following print was being seen due to missing local_bh_disable/local_bh_enable calls. mac80211 assumes that pre-emption is disabled in the data path. BUG: using smp_processor_id() in preemptible [00000000] code: iwd/517 caller is __ieee80211_subif_start_xmit+0x144/0x210 [mac80211] [...] Call Trace: dump_stack+0x5c/0x80 check_preemption_disabled.cold.0+0x46/0x51 __ieee80211_subif_start_xmit+0x144/0x210 [mac80211] Fixes: 91180649 ("mac80211: Add support for tx_control_port") Signed-off-by: NDenis Kenzior <denkenz@gmail.com> [commit message rewrite, fixes tag] Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 David Ahern 提交于
For ACLs implemented using either FIB rules or FIB entries, the BPF program needs the FIB lookup status to be able to drop the packet. Since the bpf_fib_lookup API has not reached a released kernel yet, change the return code to contain an encoding of the FIB lookup result and return the nexthop device index in the params struct. In addition, inform the BPF program of any post FIB lookup reason as to why the packet needs to go up the stack. The fib result for unicast routes must have an egress device, so remove the check that it is non-NULL. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Linus Torvalds 提交于
The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 6月, 2018 5 次提交
-
-
由 Ursula Braun 提交于
The recent poll change may lead to stalls for non-blocking connecting SMC sockets, since sock_poll_wait is no longer performed on the internal CLC socket, but on the outer SMC socket. kernel_connect() on the internal CLC socket returns with -EINPROGRESS, but the wake up logic does not work in all cases. If the internal CLC socket is still in state TCP_SYN_SENT when polled, sock_poll_wait() from sock_poll() does not sleep. It is supposed to sleep till the state of the internal CLC socket switches to TCP_ESTABLISHED. This problem triggered a redesign of the SMC nonblocking connect logic. This patch introduces a connect worker covering all connect steps followed by a wake up of socket waiters. It allows to get rid of all delays and locks in smc_poll(). Fixes: c0129a06 ("smc: convert to ->poll_mask") Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/ tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink about our recent patch removing ~16 quick acks after ECN events. tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent, but in the case the sender cwnd was lowered to 1, we do not want to have a delayed ack for the next packet we will receive. Fixes: 522040ea ("tcp: do not aggressively quick ack after ECN events") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NNeal Cardwell <ncardwell@google.com> Cc: Lawrence Brakmo <brakmo@fb.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Masahiro Yamada 提交于
What we want here is to embed a user-space program into the kernel. Instead of the complex ELF magic, let's simply wrap it in the assembly with the '.incbin' directive. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Doron Roberts-Kedes 提交于
On receving an incomplete message, the existing code stores the remaining length of the cloned skb in the early_eaten field instead of incrementing the value returned by __strp_recv. This defers invocation of sock_rfree for the current skb until the next invocation of __strp_recv, which returns early_eaten if early_eaten is non-zero. This behavior causes a stall when the current message occupies the very tail end of a massive skb, and strp_peek/need_bytes indicates that the remainder of the current message has yet to arrive on the socket. The TCP receive buffer is totally full, causing the TCP window to go to zero, so the remainder of the message will never arrive. Incrementing the value returned by __strp_recv by the amount otherwise stored in early_eaten prevents stalls of this nature. Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Masahiro Yamada 提交于
With the brand-new syntax extension of Kconfig, we can directly check the compiler capability in the configuration phase. If the cc-can-link.sh fails, the BPFILTER_UMH is automatically hidden by the dependency. I also deleted 'default n', which is no-op. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 6月, 2018 3 次提交
-
-
由 Jason A. Donenfeld 提交于
Two rules with different values of suppress_prefix or suppress_ifgroup are not the same. This fixes an -EEXIST when running: $ ip -4 rule add table main suppress_prefixlength 0 Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com> Fixes: f9d4b0c1 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sowmini Varadhan 提交于
The RDS core module creates rds_connections based on callbacks from rds_loop_transport when sending/receiving packets to local addresses. These connections will need to be cleaned up when they are created from a netns that is not init_net, and that netns is deleted. Add the changes aligned with the changes from commit ebeeb1ad ("rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management") for rds_loop_transport Reported-and-tested-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Yi-Hung Wei and Justin Pettit found a race in the garbage collection scheme used by nf_conncount. When doing list walk, we lookup the tuple in the conntrack table. If the lookup fails we remove this tuple from our list because the conntrack entry is gone. This is the common cause, but turns out its not the only one. The list entry could have been created just before by another cpu, i.e. the conntrack entry might not yet have been inserted into the global hash. The avoid this, we introduce a timestamp and the owning cpu. If the entry appears to be stale, evict only if: 1. The current cpu is the one that added the entry, or, 2. The timestamp is older than two jiffies The second constraint allows GC to be taken over by other cpu too (e.g. because a cpu was offlined or napi got moved to another cpu). We can't pretend the 'doubtful' entry wasn't in our list. Instead, when we don't find an entry indicate via IS_ERR that entry was removed ('did not exist' or withheld ('might-be-unconfirmed'). This most likely also fixes a xt_connlimit imbalance earlier reported by Dmitry Andrianov. Cc: Dmitry Andrianov <dmitry.andrianov@alertme.com> Reported-by: NJustin Pettit <jpettit@vmware.com> Reported-by: NYi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NYi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 26 6月, 2018 2 次提交
-
-
由 Jann Horn 提交于
The old code would indefinitely block other users of nf_log_mutex if a userspace access in proc_dostring() blocked e.g. due to a userfaultfd region. Fix it by moving proc_dostring() out of the locked region. This is a followup to commit 266d07cb ("netfilter: nf_log: fix sleeping function called from invalid context"), which changed this code from using rcu_read_lock() to taking nf_log_mutex. Fixes: 266d07cb ("netfilter: nf_log: fix sleeping function calle[...]") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jann Horn 提交于
When proc_dostring() is called with a non-zero offset in strict mode, it doesn't just write to the ->data buffer, it also reads. Make sure it doesn't read uninitialized data. Fixes: c6ac37d8 ("netfilter: nf_log: fix error on write NONE to [...]") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 23 6月, 2018 4 次提交
-
-
由 Cong Wang 提交于
In update_vf(): cftree_remove(cl); update_cfmin(cl->cl_parent); the cl_cfmin of cl->cl_parent is intentionally updated to 0 when that parent only has one child. And if this parent is root qdisc, we could end up, in hfsc_schedule_watchdog(), that we can't decide the next schedule time for qdisc watchdog. But it seems safe that we can just skip it, as this watchdog is not always scheduled anyway. Thanks to Marco for testing all the cases, nothing is broken. Reported-by: NMarco Berizzi <pupilla@libero.it> Tested-by: NMarco Berizzi <pupilla@libero.it> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
To compute delays, better not use time of the day which can be changed by admins or malicious programs. Also change ccid3_first_li() to use s64 type for delta variable to avoid potential overflows. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Cc: dccp@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
On fast hosts or malicious bots, we trigger a DCCP_BUG() which seems excessive. syzbot reported : BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback() CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline] ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793 ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline] dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180 dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378 dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654 sk_backlog_rcv include/net/sock.h:914 [inline] __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517 dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875 ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215 NF_HOOK include/linux/netfilter.h:287 [inline] ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:287 [inline] ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492 __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 process_backlog+0x219/0x760 net/core/dev.c:5373 napi_poll net/core/dev.c:5771 [inline] net_rx_action+0x7da/0x1980 net/core/dev.c:5837 __do_softirq+0x2e8/0xb17 kernel/softirq.c:284 run_ksoftirqd+0x86/0x100 kernel/softirq.c:645 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Cc: dccp@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hangbin Liu 提交于
After recieving MLD querys, we update idev->mc_maxdelay with max_delay from query header. This make the later unsolicited reports have the same interval with mc_maxdelay, which means we may send unsolicited reports with long interval time instead of default configured interval time. Also as we will not call ipv6_mc_reset() after device up. This issue will be there even after leave the group and join other groups. Fixes: fc4eba58 ("ipv6: make unsolicited report intervals configurable for mld") Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-