1. 21 9月, 2017 21 次提交
  2. 20 9月, 2017 19 次提交
    • D
      Merge branch 'net-speedup-netns-create-delete-time' · 8ca712c3
      David S. Miller 提交于
      Eric Dumazet says:
      
      ====================
      net: speedup netns create/delete time
      
      When rate of netns creation/deletion is high enough,
      we observe softlockups in cleanup_net() caused by huge list
      of netns and way too many rcu_barrier() calls.
      
      This patch series does some optimizations in kobject,
      and add batching to tunnels so that netns dismantles are
      less costly.
      
      IPv6 addrlabels also get a per netns list, and tcp_metrics
      also benefit from batch flushing.
      
      This gives me one order of magnitude gain.
      (~50 ms -> ~5 ms for one netns create/delete pair)
      
      Tested:
      
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do  unshare -n /bin/true >/dev/null ; done) &
      done
      wait ; grep net_namespace /proc/slabinfo
      
      Before patch series :
      
      $ time ./add_del_unshare.sh
      net_namespace        116    258   5504    1    2 : tunables    8    4    0 : slabdata    116    258      0
      
      real	3m24.910s
      user	0m0.747s
      sys	0m43.162s
      
      After :
      $ time ./add_del_unshare.sh
      net_namespace        135    291   5504    1    2 : tunables    8    4    0 : slabdata    135    291      0
      
      real	0m22.117s
      user	0m0.728s
      sys	0m35.328s
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ca712c3
    • E
      ipv4: speedup ipv6 tunnels dismantle · 64bc1781
      Eric Dumazet 提交于
      Implement exit_batch() method to dismantle more devices
      per round.
      
      (rtnl_lock() ...
       unregister_netdevice_many() ...
       rtnl_unlock())
      
      Tested:
      $ cat add_del_unshare.sh
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
      done
      wait ; grep net_namespace /proc/slabinfo
      
      Before patch :
      $ time ./add_del_unshare.sh
      net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0
      
      real    1m38.965s
      user    0m0.688s
      sys     0m37.017s
      
      After patch:
      $ time ./add_del_unshare.sh
      net_namespace        135    291   5504    1    2 : tunables    8    4    0 : slabdata    135    291      0
      
      real	0m22.117s
      user	0m0.728s
      sys	0m35.328s
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64bc1781
    • E
      ipv6: speedup ipv6 tunnels dismantle · bb401cae
      Eric Dumazet 提交于
      Implement exit_batch() method to dismantle more devices
      per round.
      
      (rtnl_lock() ...
       unregister_netdevice_many() ...
       rtnl_unlock())
      
      Tested:
      $ cat add_del_unshare.sh
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
      done
      wait ; grep net_namespace /proc/slabinfo
      
      Before patch :
      $ time ./add_del_unshare.sh
      net_namespace        110    267   5504    1    2 : tunables    8    4    0 : slabdata    110    267      0
      
      real    3m25.292s
      user    0m0.644s
      sys     0m40.153s
      
      After patch:
      
      $ time ./add_del_unshare.sh
      net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0
      
      real	1m38.965s
      user	0m0.688s
      sys	0m37.017s
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb401cae
    • E
      tcp: batch tcp_net_metrics_exit · 789e6ddb
      Eric Dumazet 提交于
      When dealing with a list of dismantling netns, we can scan
      tcp_metrics once, saving cpu cycles.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      789e6ddb
    • E
      ipv6: addrlabel: per netns list · a90c9347
      Eric Dumazet 提交于
      Having a global list of labels do not scale to thousands of
      netns in the cloud era. This causes quadratic behavior on
      netns creation and deletion.
      
      This is time having a per netns list of ~10 labels.
      
      Tested:
      
      $ time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 3.637 MB perf.data (~158898 samples) ]
      
      real    0m20.837s # instead of 0m24.227s
      user    0m0.328s
      sys     0m20.338s # instead of 0m23.753s
      
          16.17%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
          12.30%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
           6.76%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
           5.78%       ip  [kernel.kallsyms]  [k] memset_erms
           5.77%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
           5.18%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test
           4.96%       ip  [kernel.kallsyms]  [k] _raw_read_lock
           3.82%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero
           3.33%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
           2.11%       ip  [kernel.kallsyms]  [k] unmap_page_range
           1.77%       ip  [kernel.kallsyms]  [k] __wake_up
           1.69%       ip  [kernel.kallsyms]  [k] strlen
           1.17%       ip  [kernel.kallsyms]  [k] __wake_up_common
           1.09%       ip  [kernel.kallsyms]  [k] insert_header
           1.04%       ip  [kernel.kallsyms]  [k] page_remove_rmap
           1.01%       ip  [kernel.kallsyms]  [k] consume_skb
           0.98%       ip  [kernel.kallsyms]  [k] netlink_trim
           0.51%       ip  [kernel.kallsyms]  [k] kernfs_link_sibling
           0.51%       ip  [kernel.kallsyms]  [k] filemap_map_pages
           0.46%       ip  [kernel.kallsyms]  [k] memcpy_erms
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a90c9347
    • E
      kobject: factorize skb setup in kobject_uevent_net_broadcast() · d464e84e
      Eric Dumazet 提交于
      We can build one skb and let it be cloned in netlink.
      
      This is much faster, and use less memory (all clones will
      share the same skb->head)
      
      Tested:
      
      time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 4.110 MB perf.data (~179584 samples) ]
      
      real    0m24.227s # instead of 0m52.554s
      user    0m0.329s
      sys 0m23.753s # instead of 0m51.375s
      
          14.77%       ip  [kernel.kallsyms]  [k] __ip6addrlbl_add
          14.56%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
          11.65%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
           6.19%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
           5.66%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
           4.97%       ip  [kernel.kallsyms]  [k] memset_erms
           4.67%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test
           4.41%       ip  [kernel.kallsyms]  [k] _raw_read_lock
           3.59%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero
           3.13%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
           1.55%       ip  [kernel.kallsyms]  [k] __wake_up
           1.20%       ip  [kernel.kallsyms]  [k] strlen
           1.03%       ip  [kernel.kallsyms]  [k] __wake_up_common
           0.93%       ip  [kernel.kallsyms]  [k] consume_skb
           0.92%       ip  [kernel.kallsyms]  [k] netlink_trim
           0.87%       ip  [kernel.kallsyms]  [k] insert_header
           0.63%       ip  [kernel.kallsyms]  [k] unmap_page_range
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d464e84e
    • E
      kobject: copy env blob in one go · 4a336a23
      Eric Dumazet 提交于
      No need to iterate over strings, just copy in one efficient memcpy() call.
      
      Tested:
      time perf record "(for f in `seq 1 3000` ; do ip netns add tast$f; done)"
      [ perf record: Woken up 10 times to write data ]
      [ perf record: Captured and wrote 8.224 MB perf.data (~359301 samples) ]
      
      real    0m52.554s  # instead of 1m7.492s
      user    0m0.309s
      sys 0m51.375s # instead of 1m6.875s
      
           9.88%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
           8.86%       ip  [kernel.kallsyms]  [k] string
           7.37%       ip  [kernel.kallsyms]  [k] __ip6addrlbl_add
           5.68%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
           5.52%       ip  [kernel.kallsyms]  [k] memcpy_erms
           4.76%       ip  [kernel.kallsyms]  [k] __alloc_skb
           4.54%       ip  [kernel.kallsyms]  [k] vsnprintf
           3.94%       ip  [kernel.kallsyms]  [k] format_decode
           3.80%       ip  [kernel.kallsyms]  [k] kmem_cache_alloc_node_trace
           3.71%       ip  [kernel.kallsyms]  [k] kmem_cache_alloc_node
           3.66%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
           3.38%       ip  [kernel.kallsyms]  [k] strlen
           2.65%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
           2.20%       ip  [kernel.kallsyms]  [k] kfree
           2.09%       ip  [kernel.kallsyms]  [k] memset_erms
           2.07%       ip  [kernel.kallsyms]  [k] ___cache_free
           1.95%       ip  [kernel.kallsyms]  [k] kmem_cache_free
           1.91%       ip  [kernel.kallsyms]  [k] _raw_read_lock
           1.45%       ip  [kernel.kallsyms]  [k] ksize
           1.25%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
           1.00%       ip  [kernel.kallsyms]  [k] widen_string
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a336a23
    • E
      kobject: add kobject_uevent_net_broadcast() · 16dff336
      Eric Dumazet 提交于
      This removes some #ifdef pollution and will ease follow up patches.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16dff336
    • C
      net_sched: no need to free qdisc in RCU callback · 752fbcc3
      Cong Wang 提交于
      gen estimator has been rewritten in commit 1c0d32fd
      ("net_sched: gen_estimator: complete rewrite of rate estimators"),
      the caller no longer needs to wait for a grace period. So this
      patch gets rid of it.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      752fbcc3
    • J
      team: fall back to hash if table entry is empty · bd7d2106
      Jim Hanko 提交于
      If the hash to port mapping table does not have a valid port (i.e. when
      a port goes down), fall back to the simple hashing mechanism to avoid
      dropping packets.
      Signed-off-by: NJim Hanko <hanko@drivescale.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd7d2106
    • D
      Merge branch 'test_rhashtable-dont-allocate-huge-static-array' · d3b55771
      David S. Miller 提交于
      Florian Westphal says:
      
      ====================
      test_rhashtable: don't allocate huge static array
      
      Add a test case for the rhlist interface.
      While at it, cleanup current rhashtable test a bit and add a check
      for max_size support.
      
      No changes since v1, except in last patch.
      kbuild robot complained about large onstack allocation caused by
      struct rhltable when lockdep is enabled.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3b55771
    • F
      test_rhashtable: add test case for rhl_table interface · cdd4de37
      Florian Westphal 提交于
      also test rhltable.  rhltable remove operations are slow as
      deletions require a list walk, thus test with 1/16th of the given
      entry count number to get a run duration similar to rhashtable one.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdd4de37
    • F
      test_rhashtable: add a check for max_size · a6359bd8
      Florian Westphal 提交于
      add a test that tries to insert more than max_size elements.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6359bd8
    • F
      test_rhashtable: don't use global entries variable · f651616e
      Florian Westphal 提交于
      pass the entries to test as an argument instead.
      Followup patch will add an rhlist test case; rhlist delete opererations
      are slow so we need to use a smaller number to test it.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f651616e
    • F
      7e936bd7
    • D
      Merge branch 'dsa-b53-bcm_sf2-cleanups' · 3d5cc728
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: b53/bcm_sf2 cleanups
      
      This patch series is a first pass set of clean-ups to reduce the number of LOCs
      between b53 and bcm_sf2 and sharing as many functions as possible.
      
      There is a number of additional cleanups queued up locally that require more
      thorough testing.
      
      Changes in v3:
      
      - remove one extra argument for the b53_build_io_op macro (David Laight)
      - added additional Reviewed-by tags from Vivien
      
      Changes in v2:
      
      - added Reviewed-by tags from Vivien
      - added a missing EXPORT_SYMBOL() in patch 8
      - fixed a typo in patch 5
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d5cc728
    • F
      net: dsa: bcm_sf2: Utilize b53_{enable, disable}_port · f86ad77f
      Florian Fainelli 提交于
      Export b53_{enable,disable}_port and use these two functions in
      bcm_sf2_port_setup and bcm_sf2_port_disable. The generic functions
      cannot be used without wrapping because we need to manage additional
      switch integration details (PHY, Broadcom tag etc.).
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f86ad77f
    • F
      net: dsa: bcm_sf2: Use SF2_NUM_EGRESS_QUEUES for CFP · 152b6fd6
      Florian Fainelli 提交于
      The magic number 8 in 3 locations in bcm_sf2_cfp.c actually designates
      the number of switch port egress queues, so use that define instead of
      open-coding it.
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      152b6fd6
    • F
      net: dsa: b53: Export b53_imp_vlan_setup() · aac02867
      Florian Fainelli 提交于
      bcm_sf2 and b53 do exactly the same thing, so share that piece.
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aac02867