- 02 1月, 2019 1 次提交
-
-
由 Deepa Dinamani 提交于
Al Viro mentioned (Message-ID <20170626041334.GZ10672@ZenIV.linux.org.uk>) that there is probably a race condition lurking in accesses of sk_stamp on 32-bit machines. sock->sk_stamp is of type ktime_t which is always an s64. On a 32 bit architecture, we might run into situations of unsafe access as the access to the field becomes non atomic. Use seqlocks for synchronization. This allows us to avoid using spinlocks for readers as readers do not need mutual exclusion. Another approach to solve this is to require sk_lock for all modifications of the timestamps. The current approach allows for timestamps to have their own lock: sk_stamp_lock. This allows for the patch to not compete with already existing critical sections, and side effects are limited to the paths in the patch. The addition of the new field maintains the data locality optimizations from commit 9115e8cd ("net: reorganize struct sock for better data locality") Note that all the instances of the sk_stamp accesses are either through the ioctl or the syscall recvmsg. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 12月, 2018 2 次提交
-
-
由 Pablo Neira Ayuso 提交于
Instead of removing a empty list node that might be reintroduced soon thereafter, tentatively place the empty list node on the list passed to tree_nodes_free(), then re-check if the list is empty again before erasing it from the tree. [ Florian: rebase on top of pending nf_conncount fixes ] Fixes: 5c789e13 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Reviewed-by: NShawn Bohrer <sbohrer@cloudflare.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
'lookup' is always followed by 'add'. Merge both and make the list-walk part of nf_conncount_add(). This also avoids one unneeded unlock/re-lock pair. Extra care needs to be taken in count_tree, as we only hold rcu read lock, i.e. we can only insert to an existing tree node after acquiring its lock and making sure it has a nonzero count. As a zero count should be rare, just fall back to insert_tree() (which acquires tree lock). This issue and its solution were pointed out by Shawn Bohrer during patch review. Reviewed-by: NShawn Bohrer <sbohrer@cloudflare.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 25 12月, 2018 1 次提交
-
-
由 Peter Oskolkov 提交于
Patch eedbbb0d "net: dccp: initialize (addr,port) ..." added calling to inet_hashinfo2_init() from dccp_init(). However, inet_hashinfo2_init() is marked as __init(), and thus the kernel panics when dccp is loaded as module. Removing __init() tag from inet_hashinfo2_init() is not feasible because it calls into __init functions in mm. This patch adds inet_hashinfo2_init_mod() function that can be called after the init phase is done; changes dccp_init() to call the new function; un-marks inet_hashinfo2_init() as exported. Fixes: eedbbb0d ("net: dccp: initialize (addr,port) ...") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NPeter Oskolkov <posk@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 12月, 2018 6 次提交
-
-
由 Peter Oskolkov 提交于
A minor code cleanup. Signed-off-by: NPeter Oskolkov <posk@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
remove the obsolete sysctl anchors and move auto_assign_helper_warned to avoid/cover a hole. Reduces size by 40 bytes on 64 bit. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
after moving sysctl handling into single place, the init functions can't fail anymore and some of the fini functions are empty. Remove them and change return type to void. This also simplifies error unwinding in conntrack module init path. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Only one caller, just place it where its needed. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Currently DNS resolvers that send both A and AAAA queries from same source port can trigger stream mode prematurely, which results in non-early-evictable conntrack entry for three minutes, even though DNS requests are done in a few milliseconds. Add a two second grace period where we continue to use the ordinary 30-second default timeout. Its enough for DNS request/response traffic, even if two request/reply packets are involved. ASSURED is still set, else conntrack (and thus a possible NAT mapping ...) gets zapped too in case conntrack table runs full. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 John Fastabend 提交于
A sockmap program that redirects through a kTLS ULP enabled socket will not work correctly because the ULP layer is skipped. This fixes the behavior to call through the ULP layer on redirect to ensure any operations required on the data stream at the ULP layer continue to be applied. To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid calling the BPF layer on a redirected message. This is required to avoid calling the BPF layer multiple times (possibly recursively) which is not the current/expected behavior without ULPs. In the future we may add a redirect flag if users _do_ want the policy applied again but this would need to work for both ULP and non-ULP sockets and be opt-in to avoid breaking existing programs. Also to avoid polluting the flag space with an internal flag we reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with MSG_WAITFORONE. Here WAITFORONE is specific to recv path and SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing to verify is user space API is masked correctly to ensure the flag can not be set by user. (Note this needs to be true regardless because we have internal flags already in-use that user space should not be able to set). But for completeness we have two UAPI paths into sendpage, sendfile and splice. In the sendfile case the function do_sendfile() zero's flags, ./fs/read_write.c: static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, size_t count, loff_t max) { ... fl = 0; #if 0 /* * We need to debate whether we can enable this or not. The * man page documents EAGAIN return for the output at least, * and the application is arguably buggy if it doesn't expect * EAGAIN on a non-blocking file descriptor. */ if (in.file->f_flags & O_NONBLOCK) fl = SPLICE_F_NONBLOCK; #endif file_start_write(out.file); retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl); } In the splice case the pipe_to_sendpage "actor" is used which masks flags with SPLICE_F_MORE. ./fs/splice.c: static int pipe_to_sendpage(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd) { ... more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; ... } Confirming what we expect that internal flags are in fact internal to socket side. Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 20 12月, 2018 9 次提交
-
-
由 wenxu 提交于
ip l add dev tun type gretap external ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 dev gretap For gretap Key example when the command set the id but don't set the TUNNEL_KEY flags. There is no key field in the send packet In the lwtunnel situation, some TUNNEL_FLAGS should can be set by userspace Signed-off-by: Nwenxu <wenxu@ucloud.cn> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roopa Prabhu 提交于
this patch registers neigh doit handler. The doit handler returns a neigh entry given dst and dev. This is similar to route and fdb doit (get) handlers. Also moves nda_policy declaration from rtnetlink.c to neighbour.c Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Remove skb->sp and allocate secpath storage via extension infrastructure. This also reduces sk_buff by 8 bytes on x86_64. Total size of allyesconfig kernel is reduced slightly, as there is less inlined code (one conditional atomic op instead of two on skb_clone). No differences in throughput in following ipsec performance tests: - transport mode with aes on 10GB link - tunnel mode between two network namespaces with aes and null cipher Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Will reduce noise when skb->sp is removed later in this series. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
skb_sec_path gains 'const' qualifier to avoid xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type same reasoning as previous conversions: Won't need to touch these spots anymore when skb->sp is removed. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Future patch will remove skb->sp pointer. To reduce noise in those patches, move existing helper to sk_buff and use it in more places to ease skb->sp replacement later. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
It can only return 0 (success) or -ENOMEM. Change return value to a pointer to secpath struct. This avoids direct access to skb->sp: err = secpath_set(skb); if (!err) .. skb->sp-> ... Becomes: sp = secpath_set(skb) if (!sp) .. sp-> .. This reduces noise in followup patch which is going to remove skb->sp. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
This converts the bridge netfilter (calling iptables hooks from bridge) facility to use the extension infrastructure. The bridge_nf specific hooks in skb clone and free paths are removed, they have been replaced by the skb_ext hooks that do the same as the bridge nf allocations hooks did. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
This pointer is going to be removed soon, so use the existing helpers in more places to avoid noise when the removal happens. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 12月, 2018 13 次提交
-
-
由 Emmanuel Grumbach 提交于
TWT is a feature that was added in 11ah and enhanced in 11ax. There are two bits that need to be set if we want to use the feature in 11ax: one in the HE Capability IE and one in the Extended Capability IE. This is because of backward compatibility between 11ah and 11ax. In order to simplify the flow for the low level driver in managed mode, aggregate the two bits and add a boolean that tells whether TWT is supported or not, but only if 11ax is supported. Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: NLuca Coelho <luciano.coelho@intel.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Johannes Berg 提交于
In the iwlwifi conversion, we sometimes call this from outside of the wake_tx_queue() method, and in those cases must be in an RCU critical section. Document this requirement. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NLuca Coelho <luciano.coelho@intel.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Johannes Berg 提交于
The older code and current userspace assumed that this data is the content of the Measurement Report element, starting with the Measurement Token. Clarify this in the documentation. Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Randy Dunlap 提交于
Fix kernel-doc warnings in FTM due to missing "struct" keyword. Fixes 109 warnings from <net/cfg80211.h>: ../include/net/cfg80211.h:2838: warning: cannot understand function prototype: 'struct cfg80211_ftm_responder_stats ' and fixes 88 warnings from <net/mac80211.h>: ../include/net/mac80211.h:477: warning: cannot understand function prototype: 'struct ieee80211_ftm_responder_params ' Fixes: 81e54d08 ("cfg80211: support FTM responder configuration/statistics") Fixes: bc847970 ("mac80211: support FTM responder configuration/statistics") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: David Spinadel <david.spinadel@intel.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Willem de Bruijn 提交于
SOF_TIMESTAMPING_OPT_ID is supported on TCP, UDP and RAW sockets. But it was missing on RAW with IPPROTO_IP, PF_PACKET and CAN. Add skb_setup_tx_timestamp that configures both tx_flags and tskey for these paths that do not need corking or use bytestream keys. Fixes: 09c2d251 ("net-timestamp: add key to disambiguate concurrent datagrams") Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
This removes the (now empty) nf_nat_l4proto struct, all its instances and all the no longer needed runtime (un)register functionality. nf_nat_need_gre() can be axed as well: the module that calls it (to load the no-longer-existing nat_gre module) also calls other nat core functions. GRE nat is now always available if kernel is built with it. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
This removes the last l4proto indirection, the two callers, the l3proto packet mangling helpers for ipv4 and ipv6, now call the nf_nat_l4proto_manip_pkt() helper. nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though they contain no functionality anymore to not clutter this patch. Next patch will remove the empty files and the nf_nat_l4proto struct. nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the other nat manip functionality as well, not just udp and udplite. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
all protocols did set this to nf_nat_l4proto_nlattr_to_range, so just call it directly. The important difference is that we'll now also call it for protocols that we don't support (i.e., nf_nat_proto_unknown did not provide .nlattr_to_range). However, there should be no harm, even icmp provided this callback. If we don't implement a specific l4nat for this, nothing would make use of this information, so adding a big switch/case construct listing all supported l4protocols seems a bit pointless. This change leaves a single function pointer in the l4proto struct. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
With exception of icmp, all of the l4 nat protocols set this to nf_nat_l4proto_in_range. Get rid of this and just check the l4proto in the caller. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
No need for indirections here, we only support ipv4 and ipv6 and the called functions are very small. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
fold remaining users (icmp, icmpv6, gre) into nf_nat_l4proto_unique_tuple. The static-save of old incarnation of resolved key in gre and icmp is removed as well, just use the prandom based offset like the others. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
almost all l4proto->unique_tuple implementations just call this helper, so make ->unique_tuple() optional and call its helper directly if the l4proto doesn't override it. This is an intermediate step to get rid of ->unique_tuple completely. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Historically this was net_random() based, and was then converted to a hash based algorithm (private boot seed + hash of endpoint addresses) due to concerns of leaking net_random() bits. RANDOM_FULLY mode was added later to avoid problems with hash based mode (see commit 34ce3240, "netfilter: nf_nat: add full port randomization support" for details). Just make prandom_u32() the default search starting point and get rid of ->secure_port() altogether. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 17 12月, 2018 2 次提交
-
-
由 Tristram Ha 提交于
Rename the tag Kconfig option and related macros in preparation for addition of new KSZ family switches with different tag formats. Signed-off-by: NTristram Ha <Tristram.Ha@microchip.com> Signed-off-by: NMarek Vasut <marex@denx.de> Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Cc: Woojung Huh <woojung.huh@microchip.com> Cc: David S. Miller <davem@davemloft.net> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Similar to routes and rules, add protocol attribute to neighbor entries for easier tracking of how each was created. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 12月, 2018 3 次提交
-
-
由 Paolo Abeni 提交于
This avoids an indirect call in the receive path for TCP and UDP packets. TCP takes precedence on UDP, so that we have a single additional conditional in the common case. When IPV6 is build as module, all gro symbols except UDPv6 are builtin, while the latter belong to the ipv6 module, so we need some special care. v1 -> v2: - adapted to INDIRECT_CALL_ changes v2 -> v3: - fix build issue with CONFIG_IPV6=m Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
This avoids an indirect calls for L3 GRO receive path, both for ipv4 and ipv6, if the latter is not compiled as a module. Note that when IPv6 is compiled as builtin, it will be checked first, so we have a single additional compare for the more common path. v1 -> v2: - adapted to INDIRECT_CALL_ changes Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Move arp_queue_len_bytes ahead of arp_queue to remove two 4-byte holes. Ensure ha element is always 8-byte aligned. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 12月, 2018 3 次提交
-
-
由 David Ahern 提交于
neigh_update_ext_learned has one caller in neighbour.c so does not need to be defined in the header. Move it and in the process remove the intialization of ndm_flags and just set it based on the flags check. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cong Wang 提交于
After commit 69bd4840 ("net/sched: Remove egdev mechanism"), tc_setup_cb_call() is nearly identical to tcf_block_cb_call(), so we can just fold tcf_block_cb_call() into tc_setup_cb_call() and remove its unused parameter 'exts'. Fixes: 69bd4840 ("net/sched: Remove egdev mechanism") Cc: Oz Shlomo <ozsh@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NOz Shlomo <ozsh@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Atul Gupta 提交于
HW unhash within mutex for registered tls devices cause sleep when called from tcp_set_state for TCP_CLOSE. Release lock and re-acquire after function call with ref count incr/dec. defined kref and fp release for tls_device to ensure device is not released outside lock. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:748 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/7 INFO: lockdep is turned off. CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W O Call Trace: <IRQ> dump_stack+0x5e/0x8b ___might_sleep+0x222/0x260 __mutex_lock+0x5c/0xa50 ? vprintk_emit+0x1f3/0x440 ? kmem_cache_free+0x22d/0x2a0 ? tls_hw_unhash+0x2f/0x80 ? printk+0x52/0x6e ? tls_hw_unhash+0x2f/0x80 tls_hw_unhash+0x2f/0x80 tcp_set_state+0x5f/0x180 tcp_done+0x2e/0xe0 tcp_rcv_state_process+0x92c/0xdd3 ? lock_acquire+0xf5/0x1f0 ? tcp_v4_rcv+0xa7c/0xbe0 ? tcp_v4_do_rcv+0x70/0x1e0 Signed-off-by: NAtul Gupta <atul.gupta@chelsio.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-