- 28 3月, 2020 1 次提交
-
-
由 Daniel Borkmann 提交于
In Cilium we're mainly using BPF cgroup hooks today in order to implement kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*), ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic between Cilium managed nodes. While this works in its current shape and avoids packet-level NAT for inter Cilium managed node traffic, there is one major limitation we're facing today, that is, lack of netns awareness. In Kubernetes, the concept of Pods (which hold one or multiple containers) has been built around network namespaces, so while we can use the global scope of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing NodePort ports on loopback addresses), we also have the need to differentiate between initial network namespaces and non-initial one. For example, ExternalIP services mandate that non-local service IPs are not to be translated from the host (initial) network namespace as one example. Right now, we have an ugly work-around in place where non-local service IPs for ExternalIP services are not xlated from connect() and friends BPF hooks but instead via less efficient packet-level NAT on the veth tc ingress hook for Pod traffic. On top of determining whether we're in initial or non-initial network namespace we also have a need for a socket-cookie like mechanism for network namespaces scope. Socket cookies have the nice property that they can be combined as part of the key structure e.g. for BPF LRU maps without having to worry that the cookie could be recycled. We are planning to use this for our sessionAffinity implementation for services. Therefore, add a new bpf_get_netns_cookie() helper which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would provide the cookie for the initial network namespace while passing the context instead of NULL would provide the cookie from the application's network namespace. We're using a hole, so no size increase; the assignment happens only once. Therefore this allows for a comparison on initial namespace as well as regular cookie usage as we have today with socket cookies. We could later on enable this helper for other program types as well as we would see need. (*) Both externalTrafficPolicy={Local|Cluster} types [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.cSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net
-
- 17 1月, 2020 1 次提交
-
-
由 Guillaume Nault 提交于
Mark function parameters as 'const' where possible. Signed-off-by: NGuillaume Nault <gnault@redhat.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 1月, 2020 3 次提交
-
-
由 Guillaume Nault 提交于
When peernet2id() had to lock "nsid_lock" before iterating through the nsid table, we had to disable BHs, because VXLAN can call peernet2id() from the xmit path: vxlan_xmit() -> vxlan_fdb_miss() -> vxlan_fdb_notify() -> __vxlan_fdb_notify() -> vxlan_fdb_info() -> peernet2id(). Now that peernet2id() uses RCU protection, "nsid_lock" isn't used in BH context anymore. Therefore, we can safely use plain spin_lock()/spin_unlock() and let BHs run when holding "nsid_lock". Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guillaume Nault 提交于
__peernet2id() can be protected by RCU as it only calls idr_for_each(), which is RCU-safe, and never modifies the nsid table. rtnl_net_dumpid() can also do lockless lookups. It does two nested idr_for_each() calls on nsid tables (one direct call and one indirect call because of rtnl_net_dumpid_one() calling __peernet2id()). The netnsid tables are never updated. Therefore it is safe to not take the nsid_lock and run within an RCU-critical section instead. Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guillaume Nault 提交于
__peernet2id_alloc() was used for both plain lookups and for netns ID allocations (depending the value of '*alloc'). Let's separate lookups from allocations instead. That is, integrate the lookup code into __peernet2id() and make peernet2id_alloc() responsible for allocating new netns IDs when necessary. This makes it clear that __peernet2id() doesn't modify the idr and prepares the code for lockless lookups. Also, mark the 'net' argument of __peernet2id() as 'const', since we're modifying this line. Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 10月, 2019 1 次提交
-
-
由 Guillaume Nault 提交于
In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances, but there are a few paths calling rtnl_net_notifyid() from atomic context or from RCU critical sections. The later also precludes the use of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new() call is wrong too, as it uses GFP_KERNEL unconditionally. Therefore, we need to pass the GFP flags as parameter and propagate it through function calls until the proper flags can be determined. In most cases, GFP_KERNEL is fine. The exceptions are: * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump() indirectly call rtnl_net_notifyid() from RCU critical section, * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as parameter. Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used by nlmsg_new(). The function is allowed to sleep, so better make the flags consistent with the ones used in the following ovs_vport_cmd_fill_info() call. Found by code inspection. Fixes: 9a963454 ("netns: notify netns id events") Signed-off-by: NGuillaume Nault <gnault@redhat.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NPravin B Shelar <pshelar@ovn.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 10月, 2019 1 次提交
-
-
由 Takeshi Misawa 提交于
If copy_net_ns() failed after net_alloc(), net->key_domain is leaked. Fix this, by freeing key_domain in error path. syzbot report: BUG: memory leak unreferenced object 0xffff8881175007e0 (size 32): comm "syz-executor902", pid 7069, jiffies 4294944350 (age 28.400s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000a83ed741>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000a83ed741>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000a83ed741>] slab_alloc mm/slab.c:3326 [inline] [<00000000a83ed741>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<0000000059fc92b9>] kmalloc include/linux/slab.h:547 [inline] [<0000000059fc92b9>] kzalloc include/linux/slab.h:742 [inline] [<0000000059fc92b9>] net_alloc net/core/net_namespace.c:398 [inline] [<0000000059fc92b9>] copy_net_ns+0xb2/0x220 net/core/net_namespace.c:445 [<00000000a9d74bbc>] create_new_namespaces+0x141/0x2a0 kernel/nsproxy.c:103 [<000000008047d645>] unshare_nsproxy_namespaces+0x7f/0x100 kernel/nsproxy.c:202 [<000000005993ea6e>] ksys_unshare+0x236/0x490 kernel/fork.c:2674 [<0000000019417e75>] __do_sys_unshare kernel/fork.c:2742 [inline] [<0000000019417e75>] __se_sys_unshare kernel/fork.c:2740 [inline] [<0000000019417e75>] __x64_sys_unshare+0x16/0x20 kernel/fork.c:2740 [<00000000f4c5f2c8>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:296 [<0000000038550184>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 syzbot also reported other leak in copy_net_ns -> setup_net. This problem is already fixed by cf47a0b8. Fixes: 9b242610 ("keys: Network namespace domain tag") Reported-and-tested-by: syzbot+3b3296d032353c33184b@syzkaller.appspotmail.com Signed-off-by: NTakeshi Misawa <jeliantsurux@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 10月, 2019 1 次提交
-
-
由 Nicolas Dichtel 提交于
The flag NLM_F_ECHO aims to reply to the user the message notified to all listeners. It was not the case with the command RTM_NEWNSID, let's fix this. Fixes: 0c7aecd4 ("netns: add rtnl cmd to add and get peer netns ids") Reported-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NGuillaume Nault <gnault@redhat.com> Tested-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
-
- 27 6月, 2019 1 次提交
-
-
由 David Howells 提交于
Create key domain tags for network namespaces and make it possible to automatically tag keys that are used by networked services (e.g. AF_RXRPC, AFS, DNS) with the default network namespace if not set by the caller. This allows keys with the same description but in different namespaces to coexist within a keyring. Signed-off-by: NDavid Howells <dhowells@redhat.com> cc: netdev@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-afs@lists.infradead.org
-
- 23 6月, 2019 1 次提交
-
-
由 Li RongQing 提交于
ops has been iterated to first element when call pre_exit, and it needs to restore from save_ops, not save ops to save_ops Fixes: d7d99872 ("netns: add pre_exit method to struct pernet_operations") Signed-off-by: NLi RongQing <lirongqing@baidu.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 6月, 2019 1 次提交
-
-
由 Eric Dumazet 提交于
Current struct pernet_operations exit() handlers are highly discouraged to call synchronize_rcu(). There are cases where we need them, and exit_batch() does not help the common case where a single netns is dismantled. This patch leverages the existing synchronize_rcu() call in cleanup_net() Calling optional ->pre_exit() method before ->exit() or ->exit_batch() allows to benefit from a single synchronize_rcu() call. Note that the synchronize_rcu() calls added in this patch are only in error paths or slow paths. Tested: $ time for i in {1..1000}; do unshare -n /bin/false;done real 0m2.612s user 0m0.171s sys 0m2.216s Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 5月, 2019 1 次提交
-
-
由 Thomas Gleixner 提交于
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 28 4月, 2019 1 次提交
-
-
由 Johannes Berg 提交于
We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 4月, 2019 1 次提交
-
-
由 Guillaume Nault 提交于
NETNSA_NSID is signed. Use nla_get_s32() to avoid confusion. Signed-off-by: NGuillaume Nault <gnault@redhat.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 3月, 2019 1 次提交
-
-
由 Eric Dumazet 提交于
net_hash_mix() currently uses kernel address of a struct net, and is used in many places that could be used to reveal this address to a patient attacker, thus defeating KASLR, for the typical case (initial net namespace, &init_net is not dynamically allocated) I believe the original implementation tried to avoid spending too many cycles in this function, but security comes first. Also provide entropy regardless of CONFIG_NET_NS. Fixes: 0b441916 ("netns: introduce the net_hash_mix "salt" for hashes") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NAmit Klein <aksecurity@gmail.com> Reported-by: NBenny Pinkas <benny@pinkas.net> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 1月, 2019 1 次提交
-
-
由 Jakub Kicinski 提交于
Make RTM_GETNSID's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. v2: - don't check size >= sizeof(struct rtgenmsg) (Nicolas). Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 12月, 2018 1 次提交
-
-
由 Aditya Pakki 提交于
In net_ns_init(), register_pernet_subsys() could fail while registering network namespace subsystems. The fix checks the return value and sends a panic() on failure. Signed-off-by: NAditya Pakki <pakki001@umn.edu> Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 11月, 2018 5 次提交
-
-
由 Nicolas Dichtel 提交于
Like the previous patch, the goal is to ease to convert nsids from one netns to another netns. A new attribute (NETNSA_CURRENT_NSID) is added to the kernel answer when NETNSA_TARGET_NSID is provided, thus the user can easily convert nsids. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
Combined with NETNSA_TARGET_NSID, it enables to "translate" a nsid from one netns to a nsid of another netns. This is useful when using NETLINK_F_LISTEN_ALL_NSID because it helps the user to interpret a nsid received from an other netns. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
Like it was done for link and address, add the ability to perform get/dump in another netns by specifying a target nsid attribute. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
This is a preparatory work. To avoid having to much arguments for the function rtnl_net_fill(), a new structure is defined. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
This argument is not used anymore. Fixes: cab3c8ec ("netns: always provide the id to rtnl_net_fill()") Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 10月, 2018 1 次提交
-
-
由 David Ahern 提交于
Update rtnl_net_dumpid for strict data checking. If the flag is set, the dump request is expected to have an rtgenmsg struct as the header which has the family as the only element. No data may be appended. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 8月, 2018 1 次提交
-
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
-
- 21 7月, 2018 1 次提交
-
-
由 Tyler Hicks 提交于
Make net_ns_get_ownership() reusable by networking code outside of core. This is useful, for example, to allow bridge related sysfs files to be owned by container root. Add a function comment since this is a potentially dangerous function to use given the way that kobject_get_ownership() works by initializing uid and gid before calling .get_ownership(). Signed-off-by: NTyler Hicks <tyhicks@canonical.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 4月, 2018 1 次提交
-
-
由 Kirill Tkhai 提交于
This function calls call_netdevice_notifier(), which also may take net_rwsem. So, we can't use net_rwsem here. This patch makes callers of this functions take pernet_ops_rwsem, like register_netdevice_notifier() does. This will protect the modifications of net_namespace_list, and allows notifiers to take it (they won't have to care about context). Since __rtnl_link_unregister() is used on module load and unload (which are not frequent operations), this looks for me better, than make all call_netdevice_notifier() always executing in "protected net_namespace_list" context. Also, this fixes the problem we had a deal in 328fbe74 "Close race between {un, }register_netdevice_notifier and ...", and guarantees __rtnl_link_unregister() does not skip exitting net. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 3月, 2018 1 次提交
-
-
由 Kirill Tkhai 提交于
rtnl_lock() is used everywhere, and contention is very high. When someone wants to iterate over alive net namespaces, he/she has no a possibility to do that without exclusive lock. But the exclusive rtnl_lock() in such places is overkill, and it just increases the contention. Yes, there is already for_each_net_rcu() in kernel, but it requires rcu_read_lock(), and this can't be sleepable. Also, sometimes it may be need really prevent net_namespace_list growth, so for_each_net_rcu() is not fit there. This patch introduces new rw_semaphore, which will be used instead of rtnl_mutex to protect net_namespace_list. It is sleepable and allows not-exclusive iterations over net namespaces list. It allows to stop using rtnl_lock() in several places (what is made in next patches) and makes less the time, we keep rtnl_mutex. Here we just add new lock, while the explanation of we can remove rtnl_lock() there are in next patches. Fine grained locks generally are better, then one big lock, so let's do that with net_namespace_list, while the situation allows that. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 3月, 2018 4 次提交
-
-
由 Kirill Tkhai 提交于
This adds comments to different places to improve readability. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kirill Tkhai 提交于
net_sem is some undefined area name, so it will be better to make the area more defined. Rename it to pernet_ops_rwsem for better readability and better intelligibility. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kirill Tkhai 提交于
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kirill Tkhai 提交于
All pernet_operations are reviewed and converted, hooray! Reflect this in core code: setup_net() and cleanup_net() will take down_read() always. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 3月, 2018 1 次提交
-
-
由 Kirill Tkhai 提交于
Since ra_chain is per-net, we may use per-net mutexes to protect them in ip_ra_control(). This improves scalability. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 3月, 2018 1 次提交
-
-
由 Kirill Tkhai 提交于
The patch adds SLAB_ACCOUNT to flags of net_cachep cache, which enables accounting of struct net memory to memcg kmem. Since number of net_namespaces may be significant, user want to know, how much there were consumed, and control. Note, that we do not account net_generic to the same memcg, where net was accounted, moreover, we don't do this at all (*). We do not want the situation, when single memcg memory deficit prevents us to register new pernet_operations. (*)Even despite there is !current process accounting already available in linux-next. See kmalloc_memcg() there for the details. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 2月, 2018 1 次提交
-
-
由 Alexey Dobriyan 提交于
All kmem caches aren't reallocated once set up. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 2月, 2018 3 次提交
-
-
由 Kirill Tkhai 提交于
When llist_add() returns false, cleanup_net() hasn't made its llist_del_all(), while the work has already been scheduled by the first queuer. So, we may skip queue_work() in this case. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kirill Tkhai 提交于
This simplifies cleanup queueing and makes cleanup lists to use llist primitives. Since llist has its own cmpxchg() ordering, cleanup_list_lock is not more need. Also, struct llist_node is smaller, than struct list_head, so we save some bytes in struct net with this patch. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kirill Tkhai 提交于
We take net_mutex, when there are !async pernet_operations registered, and read locking of net_sem is not enough. But we may get rid of taking the mutex, and just change the logic to write lock net_sem in such cases. This obviously reduces the number of lock operations, we do. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 2月, 2018 3 次提交
-
-
由 Kirill Tkhai 提交于
net_defaults_ops introduce only net_defaults_init_net method, and it acts on net::core::sysctl_somaxconn, which is not interesting for the rest of pernet_subsys and pernet_device lists. Then, make them async. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NAndrei Vagin <avagin@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kirill Tkhai 提交于
This patch starts to convert pernet_subsys, registered from pure initcalls. net_ns_ops::net_ns_net_init/net_ns_net_init, methods use only ida_simple_* functions, which are not need a synchronization. They are synchronized by idr subsystem. So, net_ns_ops methods are able to be executed in parallel with methods of other pernet operations. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NAndrei Vagin <avagin@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kirill Tkhai 提交于
This adds new pernet_operations::async flag to indicate operations, which ->init(), ->exit() and ->exit_batch() methods are allowed to be executed in parallel with the methods of any other pernet_operations. When there are only asynchronous pernet_operations in the system, net_mutex won't be taken for a net construction and destruction. Also, remove BUG_ON(mutex_is_locked()) from net_assign_generic() without replacing with the equivalent net_sem check, as there is one more lockdep assert below. v3: Add comment near net_mutex. Suggested-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NAndrei Vagin <avagin@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-