1. 28 3月, 2018 4 次提交
  2. 23 3月, 2018 1 次提交
  3. 07 3月, 2018 1 次提交
    • K
      net: Make account struct net to memcg · 30855ffc
      Kirill Tkhai 提交于
      The patch adds SLAB_ACCOUNT to flags of net_cachep cache,
      which enables accounting of struct net memory to memcg kmem.
      Since number of net_namespaces may be significant, user
      want to know, how much there were consumed, and control.
      
      Note, that we do not account net_generic to the same memcg,
      where net was accounted, moreover, we don't do this at all (*).
      We do not want the situation, when single memcg memory deficit
      prevents us to register new pernet_operations.
      
      (*)Even despite there is !current process accounting already
      available in linux-next. See kmalloc_memcg() there for the details.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30855ffc
  4. 27 2月, 2018 1 次提交
  5. 21 2月, 2018 3 次提交
  6. 13 2月, 2018 7 次提交
    • K
      net: Convert net_defaults_ops · ff291d00
      Kirill Tkhai 提交于
      net_defaults_ops introduce only net_defaults_init_net method,
      and it acts on net::core::sysctl_somaxconn, which
      is not interesting for the rest of pernet_subsys and
      pernet_device lists. Then, make them async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff291d00
    • K
      net: Convert net_ns_ops methods · 3fc3b827
      Kirill Tkhai 提交于
      This patch starts to convert pernet_subsys, registered
      from pure initcalls.
      
      net_ns_ops::net_ns_net_init/net_ns_net_init, methods use only
      ida_simple_* functions, which are not need a synchronization.
      They are synchronized by idr subsystem.
      
      So, net_ns_ops methods are able to be executed
      in parallel with methods of other pernet operations.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fc3b827
    • K
      net: Allow pernet_operations to be executed in parallel · 447cd7a0
      Kirill Tkhai 提交于
      This adds new pernet_operations::async flag to indicate operations,
      which ->init(), ->exit() and ->exit_batch() methods are allowed
      to be executed in parallel with the methods of any other pernet_operations.
      
      When there are only asynchronous pernet_operations in the system,
      net_mutex won't be taken for a net construction and destruction.
      
      Also, remove BUG_ON(mutex_is_locked()) from net_assign_generic()
      without replacing with the equivalent net_sem check, as there is
      one more lockdep assert below.
      
      v3: Add comment near net_mutex.
      Suggested-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      447cd7a0
    • K
      net: Move mutex_unlock() in cleanup_net() up · bcab1ddd
      Kirill Tkhai 提交于
      net_sem protects from pernet_list changing, while
      ops_free_list() makes simple kfree(), and it can't
      race with other pernet_operations callbacks.
      
      So we may release net_mutex earlier then it was.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcab1ddd
    • K
      net: Introduce net_sem for protection of pernet_list · 1a57feb8
      Kirill Tkhai 提交于
      Currently, the mutex is mostly used to protect pernet operations
      list. It orders setup_net() and cleanup_net() with parallel
      {un,}register_pernet_operations() calls, so ->exit{,batch} methods
      of the same pernet operations are executed for a dying net, as
      were used to call ->init methods, even after the net namespace
      is unlinked from net_namespace_list in cleanup_net().
      
      But there are several problems with scalability. The first one
      is that more than one net can't be created or destroyed
      at the same moment on the node. For big machines with many cpus
      running many containers it's very sensitive.
      
      The second one is that it's need to synchronize_rcu() after net
      is removed from net_namespace_list():
      
      Destroy net_ns:
      cleanup_net()
        mutex_lock(&net_mutex)
        list_del_rcu(&net->list)
        synchronize_rcu()                                  <--- Sleep there for ages
        list_for_each_entry_reverse(ops, &pernet_list, list)
          ops_exit_list(ops, &net_exit_list)
        list_for_each_entry_reverse(ops, &pernet_list, list)
          ops_free_list(ops, &net_exit_list)
        mutex_unlock(&net_mutex)
      
      This primitive is not fast, especially on the systems with many processors
      and/or when preemptible RCU is enabled in config. So, all the time, while
      cleanup_net() is waiting for RCU grace period, creation of new net namespaces
      is not possible, the tasks, who makes it, are sleeping on the same mutex:
      
      Create net_ns:
      copy_net_ns()
        mutex_lock_killable(&net_mutex)                    <--- Sleep there for ages
      
      I observed 20-30 seconds hangs of "unshare -n" on ordinary 8-cpu laptop
      with preemptible RCU enabled after CRIU tests round is finished.
      
      The solution is to convert net_mutex to the rw_semaphore and add fine grain
      locks to really small number of pernet_operations, what really need them.
      
      Then, pernet_operations::init/::exit methods, modifying the net-related data,
      will require down_read() locking only, while down_write() will be used
      for changing pernet_list (i.e., when modules are being loaded and unloaded).
      
      This gives signify performance increase, after all patch set is applied,
      like you may see here:
      
      %for i in {1..10000}; do unshare -n bash -c exit; done
      
      *before*
      real 1m40,377s
      user 0m9,672s
      sys 0m19,928s
      
      *after*
      real 0m17,007s
      user 0m5,311s
      sys 0m11,779
      
      (5.8 times faster)
      
      This patch starts replacing net_mutex to net_sem. It adds rw_semaphore,
      describes the variables it protects, and makes to use, where appropriate.
      net_mutex is still present, and next patches will kick it out step-by-step.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a57feb8
    • K
      net: Cleanup in copy_net_ns() · 5ba049a5
      Kirill Tkhai 提交于
      Line up destructors actions in the revers order
      to constructors. Next patches will add more actions,
      and this will be comfortable, if there is the such
      order.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ba049a5
    • K
      net: Assign net to net_namespace_list in setup_net() · 98f6c533
      Kirill Tkhai 提交于
      This patch merges two repeating pieces of code in one,
      and they will live in setup_net() now.
      
      The only change is that assignment:
      
      	init_net_initialized = true;
      
      becomes reordered with:
      
      	list_add_tail_rcu(&net->list, &net_namespace_list);
      
      The order does not have visible effect, and it is a simple
      cleanup because of:
      
      init_net_initialized is used in !CONFIG_NET_NS case
      to order proc_net_ns_ops registration occuring at boot time:
      
      	start_kernel()->proc_root_init()->proc_net_init(),
      with
      	net_ns_init()->setup_net(&init_net, &init_user_ns)
      
      also occuring in boot time from the same init_task.
      
      When there are no another tasks to race with them,
      for the single task it does not matter, which order
      two sequential independent loads should be made.
      So we make them reordered.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98f6c533
  7. 26 1月, 2018 1 次提交
    • K
      net: Move net:netns_ids destruction out of rtnl_lock() and document locking scheme · fb07a820
      Kirill Tkhai 提交于
      Currently, we unhash a dying net from netns_ids lists
      under rtnl_lock(). It's a leftover from the time when
      net::netns_ids was introduced. There was no net::nsid_lock,
      and rtnl_lock() was mostly need to order modification
      of alive nets nsid idr, i.e. for:
      	for_each_net(tmp) {
      		...
      		id = __peernet2id(tmp, net);
      		idr_remove(&tmp->netns_ids, id);
      		...
      	}
      
      Since we have net::nsid_lock, the modifications are
      protected by this local lock, and now we may introduce
      better scheme of netns_ids destruction.
      
      Let's look at the functions peernet2id_alloc() and
      get_net_ns_by_id(). Previous commits taught these
      functions to work well with dying net acquired from
      rtnl unlocked lists. And they are the only functions
      which can hash a net to netns_ids or obtain from there.
      And as easy to check, other netns_ids operating functions
      works with id, not with net pointers. So, we do not
      need rtnl_lock to synchronize cleanup_net() with all them.
      
      The another property, which is used in the patch,
      is that net is unhashed from net_namespace_list
      in the only place and by the only process. So,
      we avoid excess rcu_read_lock() or rtnl_lock(),
      when we'are iterating over the list in unhash_nsid().
      
      All the above makes possible to keep rtnl_lock() locked
      only for net->list deletion, and completely avoid it
      for netns_ids unhashing and destruction. As these two
      doings may take long time (e.g., memory allocation
      to send skb), the patch should positively act on
      the scalability and signify decrease the time, which
      rtnl_lock() is held in cleanup_net().
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb07a820
  8. 18 1月, 2018 2 次提交
    • K
      net: Remove spinlock from get_net_ns_by_id() · 42157277
      Kirill Tkhai 提交于
      idr_find() is safe under rcu_read_lock() and
      maybe_get_net() guarantees that net is alive.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42157277
    • K
      net: Fix possible race in peernet2id_alloc() · 0c06bea9
      Kirill Tkhai 提交于
      peernet2id_alloc() is racy without rtnl_lock() as refcount_read(&peer->count)
      under net->nsid_lock does not guarantee, peer is alive:
      
      rcu_read_lock()
      peernet2id_alloc()                            ..
        spin_lock_bh(&net->nsid_lock)               ..
        refcount_read(&peer->count) (!= 0)          ..
        ..                                          put_net()
        ..                                            cleanup_net()
        ..                                              for_each_net(tmp)
        ..                                                spin_lock_bh(&tmp->nsid_lock)
        ..                                                __peernet2id(tmp, net) == -1
        ..                                                    ..
        ..                                                    ..
          __peernet2id_alloc(alloc == true)                   ..
        ..                                                    ..
      rcu_read_unlock()                                       ..
      ..                                                synchronize_rcu()
      ..                                                kmem_cache_free(net)
      
      After the above situation, net::netns_id contains id pointing to freed memory,
      and any other dereferencing by the id will operate with this freed memory.
      
      Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
      ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
      is generic interface, and better we fix it before someone really starts
      use it in wrong context.
      
      v2: Don't place refcount_read(&net->count) under net->nsid_lock
          as suggested by Eric W. Biederman <ebiederm@xmission.com>
      v3: Rebase on top of net-next
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c06bea9
  9. 16 1月, 2018 1 次提交
  10. 21 12月, 2017 1 次提交
    • E
      net: Fix double free and memory corruption in get_net_ns_by_id() · 21b59443
      Eric W. Biederman 提交于
      (I can trivially verify that that idr_remove in cleanup_net happens
       after the network namespace count has dropped to zero --EWB)
      
      Function get_net_ns_by_id() does not check for net::count
      after it has found a peer in netns_ids idr.
      
      It may dereference a peer, after its count has already been
      finaly decremented. This leads to double free and memory
      corruption:
      
      put_net(peer)                                   rtnl_lock()
      atomic_dec_and_test(&peer->count) [count=0]     ...
      __put_net(peer)                                 get_net_ns_by_id(net, id)
        spin_lock(&cleanup_list_lock)
        list_add(&net->cleanup_list, &cleanup_list)
        spin_unlock(&cleanup_list_lock)
      queue_work()                                      peer = idr_find(&net->netns_ids, id)
        |                                               get_net(peer) [count=1]
        |                                               ...
        |                                               (use after final put)
        v                                               ...
        cleanup_net()                                   ...
          spin_lock(&cleanup_list_lock)                 ...
          list_replace_init(&cleanup_list, ..)          ...
          spin_unlock(&cleanup_list_lock)               ...
          ...                                           ...
          ...                                           put_net(peer)
          ...                                             atomic_dec_and_test(&peer->count) [count=0]
          ...                                               spin_lock(&cleanup_list_lock)
          ...                                               list_add(&net->cleanup_list, &cleanup_list)
          ...                                               spin_unlock(&cleanup_list_lock)
          ...                                             queue_work()
          ...                                           rtnl_unlock()
          rtnl_lock()                                   ...
          for_each_net(tmp) {                           ...
            id = __peernet2id(tmp, peer)                ...
            spin_lock_irq(&tmp->nsid_lock)              ...
            idr_remove(&tmp->netns_ids, id)             ...
            ...                                         ...
            net_drop_ns()                               ...
      	net_free(peer)                            ...
          }                                             ...
        |
        v
        cleanup_net()
          ...
          (Second free of peer)
      
      Also, put_net() on the right cpu may reorder with left's cpu
      list_replace_init(&cleanup_list, ..), and then cleanup_list
      will be corrupted.
      
      Since cleanup_net() is executed in worker thread, while
      put_net(peer) can happen everywhere, there should be
      enough time for concurrent get_net_ns_by_id() to pick
      the peer up, and the race does not seem to be unlikely.
      The patch fixes the problem in standard way.
      
      (Also, there is possible problem in peernet2id_alloc(), which requires
      check for net::count under nsid_lock and maybe_get_net(peer), but
      in current stable kernel it's used under rtnl_lock() and it has to be
      safe. Openswitch begun to use peernet2id_alloc(), and possibly it should
      be fixed too. While this is not in stable kernel yet, so I'll send
      a separate message to netdev@ later).
      
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Fixes: 0c7aecd4 "netns: add rtnl cmd to add and get peer netns ids"
      Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21b59443
  11. 05 11月, 2017 1 次提交
  12. 10 8月, 2017 2 次提交
  13. 01 7月, 2017 1 次提交
  14. 20 6月, 2017 1 次提交
    • F
      netns: add and use net_ns_barrier · 7866cc57
      Florian Westphal 提交于
      Quoting Joe Stringer:
        If a user loads nf_conntrack_ftp, sends FTP traffic through a network
        namespace, destroys that namespace then unloads the FTP helper module,
        then the kernel will crash.
      
      Events that lead to the crash:
      1. conntrack is created with ftp helper in netns x
      2. This netns is destroyed
      3. netns destruction is scheduled
      4. netns destruction wq starts, removes netns from global list
      5. ftp helper is unloaded, which resets all helpers of the conntracks
      via for_each_net()
      
      but because netns is already gone from list the for_each_net() loop
      doesn't include it, therefore all of these conntracks are unaffected.
      
      6. helper module unload finishes
      7. netns wq invokes destructor for rmmod'ed helper
      
      CC: "Eric W. Biederman" <ebiederm@xmission.com>
      Reported-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7866cc57
  15. 11 6月, 2017 2 次提交
  16. 26 5月, 2017 1 次提交
    • R
      net: move somaxconn init from sysctl code · 7c3f1875
      Roman Kapl 提交于
      The default value for somaxconn is set in sysctl_core_net_init(), but this
      function is not called when kernel is configured without CONFIG_SYSCTL.
      
      This results in the kernel not being able to accept TCP connections,
      because the backlog has zero size. Usually, the user ends up with:
      "TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request.  Check SNMP counters."
      If SYN cookies are not enabled the connection is rejected.
      
      Before ef547f2a (tcp: remove max_qlen_log), the effects were less
      severe, because the backlog was always at least eight slots long.
      Signed-off-by: NRoman Kapl <roman.kapl@sysgo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c3f1875
  17. 01 5月, 2017 1 次提交
  18. 18 4月, 2017 1 次提交
  19. 14 4月, 2017 1 次提交
  20. 02 3月, 2017 1 次提交
  21. 15 12月, 2016 1 次提交
  22. 04 12月, 2016 3 次提交
    • A
      netns: fix net_generic() "id - 1" bloat · 6af2d5ff
      Alexey Dobriyan 提交于
      net_generic() function is both a) inline and b) used ~600 times.
      
      It has the following code inside
      
      		...
      	ptr = ng->ptr[id - 1];
      		...
      
      "id" is never compile time constant so compiler is forced to subtract 1.
      And those decrements or LEA [r32 - 1] instructions add up.
      
      We also start id'ing from 1 to catch bugs where pernet sybsystem id
      is not initialized and 0. This is quite pointless idea (nothing will
      work or immediate interference with first registered subsystem) in
      general but it hints what needs to be done for code size reduction.
      
      Namely, overlaying allocation of pointer array and fixed part of
      structure in the beginning and using usual base-0 addressing.
      
      Ids are just cookies, their exact values do not matter, so lets start
      with 3 on x86_64.
      
      Code size savings (oh boy): -4.2 KB
      
      As usual, ignore the initial compiler stupidity part of the table.
      
      	add/remove: 0/0 grow/shrink: 12/670 up/down: 89/-4297 (-4208)
      	function                                     old     new   delta
      	tipc_nametbl_insert_publ                    1250    1270     +20
      	nlmclnt_lookup_host                          686     703     +17
      	nfsd4_encode_fattr                          5930    5941     +11
      	nfs_get_client                              1050    1061     +11
      	register_pernet_operations                   333     342      +9
      	tcf_mirred_init                              843     849      +6
      	tcf_bpf_init                                1143    1149      +6
      	gss_setup_upcall                             990     994      +4
      	idmap_name_to_id                             432     434      +2
      	ops_init                                     274     275      +1
      	nfsd_inject_forget_client                    259     260      +1
      	nfs4_alloc_client                            612     613      +1
      	tunnel_key_walker                            164     163      -1
      
      		...
      
      	tipc_bcbase_select_primary                   392     360     -32
      	mac80211_hwsim_new_radio                    2808    2767     -41
      	ipip6_tunnel_ioctl                          2228    2186     -42
      	tipc_bcast_rcv                               715     672     -43
      	tipc_link_build_proto_msg                   1140    1089     -51
      	nfsd4_lock                                  3851    3796     -55
      	tipc_mon_rcv                                1012     956     -56
      	Total: Before=156643951, After=156639743, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6af2d5ff
    • A
      netns: add dummy struct inside "struct net_generic" · 9bfc7b99
      Alexey Dobriyan 提交于
      This is precursor to fixing "[id - 1]" bloat inside net_generic().
      
      Name "s" is chosen to complement name "u" often used for dummy unions.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bfc7b99
    • A
      netns: publish net_generic correctly · 1a9a0592
      Alexey Dobriyan 提交于
      Publishing net_generic pointer is done with silly mistake: new array is
      published BEFORE setting freshly acquired pernet subsystem pointer.
      
      	memcpy
      	rcu_assign_pointer
      	kfree_rcu
      	ng->ptr[id - 1] = data;
      
      This bug was introduced with commit dec827d1
      ("[NETNS]: The generic per-net pointers.") in the glorious days of
      chopping networking stack into containers proper 8.5 years ago (whee...)
      
      How it didn't trigger for so long?
      Well, you need quite specific set of conditions:
      
      *) race window opens once per pernet subsystem addition
         (read: modprobe or boot)
      
      *) not every pernet subsystem is eligible (need ->id and ->size)
      
      *) not every pernet subsystem is vulnerable (need incorrect or absense
         of ordering of register_pernet_sybsys() and actually using net_generic())
      
      *) to hide the bug even more, default is to preallocate 13 pointers which
         is actually quite a lot. You need IPv6, netfilter, bridging etc together
         loaded to trigger reallocation in the first place. Trimmed down
         config are OK.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a9a0592
  23. 18 11月, 2016 2 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
    • W
      net: check dead netns for peernet2id_alloc() · cfc44a4d
      WANG Cong 提交于
      Andrei reports we still allocate netns ID from idr after we destroy
      it in cleanup_net().
      
      cleanup_net():
        ...
        idr_destroy(&net->netns_ids);
        ...
        list_for_each_entry_reverse(ops, &pernet_list, list)
          ops_exit_list(ops, &net_exit_list);
            -> rollback_registered_many()
              -> rtmsg_ifinfo_build_skb()
               -> rtnl_fill_ifinfo()
                 -> peernet2id_alloc()
      
      After that point we should not even access net->netns_ids, we
      should check the death of the current netns as early as we can in
      peernet2id_alloc().
      
      For net-next we can consider to avoid sending rtmsg totally,
      it is a good optimization for netns teardown path.
      
      Fixes: 0c7aecd4 ("netns: add rtnl cmd to add and get peer netns ids")
      Reported-by: NAndrei Vagin <avagin@gmail.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfc44a4d