1. 21 1月, 2021 2 次提交
  2. 20 1月, 2021 2 次提交
  3. 19 1月, 2021 1 次提交
  4. 15 1月, 2021 1 次提交
  5. 14 1月, 2021 1 次提交
  6. 10 1月, 2021 1 次提交
  7. 09 1月, 2021 4 次提交
  8. 08 1月, 2021 1 次提交
  9. 17 12月, 2020 1 次提交
  10. 15 12月, 2020 1 次提交
  11. 09 12月, 2020 1 次提交
    • T
      xdp: Remove the xdp_attachment_flags_ok() callback · 998f1729
      Toke Høiland-Jørgensen 提交于
      Since commit 7f0a8382 ("bpf, xdp: Maintain info on attached XDP BPF
      programs in net_device"), the XDP program attachment info is now maintained
      in the core code. This interacts badly with the xdp_attachment_flags_ok()
      check that prevents unloading an XDP program with different load flags than
      it was loaded with. In practice, two kinds of failures are seen:
      
      - An XDP program loaded without specifying a mode (and which then ends up
        in driver mode) cannot be unloaded if the program mode is specified on
        unload.
      
      - The dev_xdp_uninstall() hook always calls the driver callback with the
        mode set to the type of the program but an empty flags argument, which
        means the flags_ok() check prevents the program from being removed,
        leading to bpf prog reference leaks.
      
      The original reason this check was added was to avoid ambiguity when
      multiple programs were loaded. With the way the checks are done in the core
      now, this is quite simple to enforce in the core code, so let's add a check
      there and get rid of the xdp_attachment_flags_ok() callback entirely.
      
      Fixes: 7f0a8382 ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk
      998f1729
  12. 02 12月, 2020 1 次提交
  13. 01 12月, 2020 3 次提交
  14. 28 11月, 2020 1 次提交
  15. 25 11月, 2020 2 次提交
  16. 24 11月, 2020 1 次提交
  17. 18 11月, 2020 1 次提交
  18. 10 11月, 2020 1 次提交
  19. 03 11月, 2020 1 次提交
  20. 28 10月, 2020 1 次提交
  21. 25 10月, 2020 1 次提交
    • W
      random32: add noise from network and scheduling activity · 3744741a
      Willy Tarreau 提交于
      With the removal of the interrupt perturbations in previous random32
      change (random32: make prandom_u32() output unpredictable), the PRNG
      has become 100% deterministic again. While SipHash is expected to be
      way more robust against brute force than the previous Tausworthe LFSR,
      there's still the risk that whoever has even one temporary access to
      the PRNG's internal state is able to predict all subsequent draws till
      the next reseed (roughly every minute). This may happen through a side
      channel attack or any data leak.
      
      This patch restores the spirit of commit f227e3ec ("random32: update
      the net random state on interrupt and activity") in that it will perturb
      the internal PRNG's statee using externally collected noise, except that
      it will not pick that noise from the random pool's bits nor upon
      interrupt, but will rather combine a few elements along the Tx path
      that are collectively hard to predict, such as dev, skb and txq
      pointers, packet length and jiffies values. These ones are combined
      using a single round of SipHash into a single long variable that is
      mixed with the net_rand_state upon each invocation.
      
      The operation was inlined because it produces very small and efficient
      code, typically 3 xor, 2 add and 2 rol. The performance was measured
      to be the same (even very slightly better) than before the switch to
      SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
      (i40e), the connection rate dropped from 556k/s to 555k/s while the
      SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.
      
      Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
      Cc: George Spelvin <lkml@sdf.org>
      Cc: Amit Klein <aksecurity@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: tytso@mit.edu
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Marc Plumb <lkml.mplumb@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      3744741a
  22. 19 10月, 2020 1 次提交
    • T
      net: core: use list_del_init() instead of list_del() in netdev_run_todo() · 0e8b8d6a
      Taehee Yoo 提交于
      dev->unlink_list is reused unless dev is deleted.
      So, list_del() should not be used.
      Due to using list_del(), dev->unlink_list can't be reused so that
      dev->nested_level update logic doesn't work.
      In order to fix this bug, list_del_init() should be used instead
      of list_del().
      
      Test commands:
          ip link add bond0 type bond
          ip link add bond1 type bond
          ip link set bond0 master bond1
          ip link set bond0 nomaster
          ip link set bond1 master bond0
          ip link set bond1 nomaster
      
      Splat looks like:
      [  255.750458][ T1030] ============================================
      [  255.751967][ T1030] WARNING: possible recursive locking detected
      [  255.753435][ T1030] 5.9.0-rc8+ #772 Not tainted
      [  255.754553][ T1030] --------------------------------------------
      [  255.756047][ T1030] ip/1030 is trying to acquire lock:
      [  255.757304][ T1030] ffff88811782a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync_multiple+0xc2/0x150
      [  255.760056][ T1030]
      [  255.760056][ T1030] but task is already holding lock:
      [  255.761862][ T1030] ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding]
      [  255.764581][ T1030]
      [  255.764581][ T1030] other info that might help us debug this:
      [  255.766645][ T1030]  Possible unsafe locking scenario:
      [  255.766645][ T1030]
      [  255.768566][ T1030]        CPU0
      [  255.769415][ T1030]        ----
      [  255.770259][ T1030]   lock(&dev_addr_list_lock_key/1);
      [  255.771629][ T1030]   lock(&dev_addr_list_lock_key/1);
      [  255.772994][ T1030]
      [  255.772994][ T1030]  *** DEADLOCK ***
      [  255.772994][ T1030]
      [  255.775091][ T1030]  May be due to missing lock nesting notation
      [  255.775091][ T1030]
      [  255.777182][ T1030] 2 locks held by ip/1030:
      [  255.778299][ T1030]  #0: ffffffffb1f63250 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x2e4/0x8b0
      [  255.780600][ T1030]  #1: ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding]
      [  255.783411][ T1030]
      [  255.783411][ T1030] stack backtrace:
      [  255.784874][ T1030] CPU: 7 PID: 1030 Comm: ip Not tainted 5.9.0-rc8+ #772
      [  255.786595][ T1030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [  255.789030][ T1030] Call Trace:
      [  255.789850][ T1030]  dump_stack+0x99/0xd0
      [  255.790882][ T1030]  __lock_acquire.cold.71+0x166/0x3cc
      [  255.792285][ T1030]  ? register_lock_class+0x1a30/0x1a30
      [  255.793619][ T1030]  ? rcu_read_lock_sched_held+0x91/0xc0
      [  255.794963][ T1030]  ? rcu_read_lock_bh_held+0xa0/0xa0
      [  255.796246][ T1030]  lock_acquire+0x1b8/0x850
      [  255.797332][ T1030]  ? dev_mc_sync_multiple+0xc2/0x150
      [  255.798624][ T1030]  ? bond_enslave+0x3d4d/0x43e0 [bonding]
      [  255.800039][ T1030]  ? check_flags+0x50/0x50
      [  255.801143][ T1030]  ? lock_contended+0xd80/0xd80
      [  255.802341][ T1030]  _raw_spin_lock_nested+0x2e/0x70
      [  255.803592][ T1030]  ? dev_mc_sync_multiple+0xc2/0x150
      [  255.804897][ T1030]  dev_mc_sync_multiple+0xc2/0x150
      [  255.806168][ T1030]  bond_enslave+0x3d58/0x43e0 [bonding]
      [  255.807542][ T1030]  ? __lock_acquire+0xe53/0x51b0
      [  255.808824][ T1030]  ? bond_update_slave_arr+0xdc0/0xdc0 [bonding]
      [  255.810451][ T1030]  ? check_chain_key+0x236/0x5e0
      [  255.811742][ T1030]  ? mutex_is_locked+0x13/0x50
      [  255.812910][ T1030]  ? rtnl_is_locked+0x11/0x20
      [  255.814061][ T1030]  ? netdev_master_upper_dev_get+0xf/0x120
      [  255.815553][ T1030]  do_setlink+0x94c/0x3040
      [ ... ]
      
      Reported-by: syzbot+4a0f7bc34e3997a6c7df@syzkaller.appspotmail.com
      Fixes: 1fc70edb ("net: core: add nested_level variable in net_device")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Link: https://lore.kernel.org/r/20201015162606.9377-1-ap420073@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      0e8b8d6a
  23. 14 10月, 2020 1 次提交
  24. 12 10月, 2020 1 次提交
    • D
      bpf: Add redirect_peer helper · 9aa1206e
      Daniel Borkmann 提交于
      Add an efficient ingress to ingress netns switch that can be used out of tc BPF
      programs in order to redirect traffic from host ns ingress into a container
      veth device ingress without having to go via CPU backlog queue [0]. For local
      containers this can also be utilized and path via CPU backlog queue only needs
      to be taken once, not twice. On a high level this borrows from ipvlan which does
      similar switch in __netif_receive_skb_core() and then iterates via another_round.
      This helps to reduce latency for mentioned use cases.
      
      Pod to remote pod with redirect(), TCP_RR [1]:
      
        # percpu_netperf 10.217.1.33
                RT_LATENCY:         122.450         (per CPU:         122.666         122.401         122.333         122.401 )
              MEAN_LATENCY:         121.210         (per CPU:         121.100         121.260         121.320         121.160 )
            STDDEV_LATENCY:         120.040         (per CPU:         119.420         119.910         125.460         115.370 )
               MIN_LATENCY:          46.500         (per CPU:          47.000          47.000          47.000          45.000 )
               P50_LATENCY:         118.500         (per CPU:         118.000         119.000         118.000         119.000 )
               P90_LATENCY:         127.500         (per CPU:         127.000         128.000         127.000         128.000 )
               P99_LATENCY:         130.750         (per CPU:         131.000         131.000         129.000         132.000 )
      
          TRANSACTION_RATE:       32666.400         (per CPU:        8152.200        8169.842        8174.439        8169.897 )
      
      Pod to remote pod with redirect_peer(), TCP_RR:
      
        # percpu_netperf 10.217.1.33
                RT_LATENCY:          44.449         (per CPU:          43.767          43.127          45.279          45.622 )
              MEAN_LATENCY:          45.065         (per CPU:          44.030          45.530          45.190          45.510 )
            STDDEV_LATENCY:          84.823         (per CPU:          66.770          97.290          84.380          90.850 )
               MIN_LATENCY:          33.500         (per CPU:          33.000          33.000          34.000          34.000 )
               P50_LATENCY:          43.250         (per CPU:          43.000          43.000          43.000          44.000 )
               P90_LATENCY:          46.750         (per CPU:          46.000          47.000          47.000          47.000 )
               P99_LATENCY:          52.750         (per CPU:          51.000          54.000          53.000          53.000 )
      
          TRANSACTION_RATE:       90039.500         (per CPU:       22848.186       23187.089       22085.077       21919.130 )
      
        [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf
        [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
      9aa1206e
  25. 30 9月, 2020 1 次提交
    • S
      net: Add netif_rx_any_context() · c11171a4
      Sebastian Andrzej Siewior 提交于
      Quite some drivers make conditional decisions based on in_interrupt() to
      invoke either netif_rx() or netif_rx_ni().
      
      Conditionals based on in_interrupt() or other variants of preempt count
      checks in drivers should not exist for various reasons and Linus clearly
      requested to either split the code pathes or pass an argument to the
      common functions which provides the context.
      
      This is obviously the correct solution, but for some of the affected
      drivers this needs a major rewrite due to their convoluted structure.
      
      As in_interrupt() usage in drivers needs to be phased out, provide
      netif_rx_any_context() as a stop gap for these drivers.
      
      This confines the in_interrupt() conditional to core code which in turn
      allows to remove the access to this check for driver code and provides one
      central place to do further modifications once the driver maze is cleaned
      up.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c11171a4
  26. 29 9月, 2020 3 次提交
    • T
      net: core: add nested_level variable in net_device · 1fc70edb
      Taehee Yoo 提交于
      This patch is to add a new variable 'nested_level' into the net_device
      structure.
      This variable will be used as a parameter of spin_lock_nested() of
      dev->addr_list_lock.
      
      netif_addr_lock() can be called recursively so spin_lock_nested() is
      used instead of spin_lock() and dev->lower_level is used as a parameter
      of spin_lock_nested().
      But, dev->lower_level value can be updated while it is being used.
      So, lockdep would warn a possible deadlock scenario.
      
      When a stacked interface is deleted, netif_{uc | mc}_sync() is
      called recursively.
      So, spin_lock_nested() is called recursively too.
      At this moment, the dev->lower_level variable is used as a parameter of it.
      dev->lower_level value is updated when interfaces are being unlinked/linked
      immediately.
      Thus, After unlinking, dev->lower_level shouldn't be a parameter of
      spin_lock_nested().
      
          A (macvlan)
          |
          B (vlan)
          |
          C (bridge)
          |
          D (macvlan)
          |
          E (vlan)
          |
          F (bridge)
      
          A->lower_level : 6
          B->lower_level : 5
          C->lower_level : 4
          D->lower_level : 3
          E->lower_level : 2
          F->lower_level : 1
      
      When an interface 'A' is removed, it releases resources.
      At this moment, netif_addr_lock() would be called.
      Then, netdev_upper_dev_unlink() is called recursively.
      Then dev->lower_level is updated.
      There is no problem.
      
      But, when the bridge module is removed, 'C' and 'F' interfaces
      are removed at once.
      If 'F' is removed first, a lower_level value is like below.
          A->lower_level : 5
          B->lower_level : 4
          C->lower_level : 3
          D->lower_level : 2
          E->lower_level : 1
          F->lower_level : 1
      
      Then, 'C' is removed. at this moment, netif_addr_lock() is called
      recursively.
      The ordering is like this.
      C(3)->D(2)->E(1)->F(1)
      At this moment, the lower_level value of 'E' and 'F' are the same.
      So, lockdep warns a possible deadlock scenario.
      
      In order to avoid this problem, a new variable 'nested_level' is added.
      This value is the same as dev->lower_level - 1.
      But this value is updated in rtnl_unlock().
      So, this variable can be used as a parameter of spin_lock_nested() safely
      in the rtnl context.
      
      Test commands:
         ip link add br0 type bridge vlan_filtering 1
         ip link add vlan1 link br0 type vlan id 10
         ip link add macvlan2 link vlan1 type macvlan
         ip link add br3 type bridge vlan_filtering 1
         ip link set macvlan2 master br3
         ip link add vlan4 link br3 type vlan id 10
         ip link add macvlan5 link vlan4 type macvlan
         ip link add br6 type bridge vlan_filtering 1
         ip link set macvlan5 master br6
         ip link add vlan7 link br6 type vlan id 10
         ip link add macvlan8 link vlan7 type macvlan
      
         ip link set br0 up
         ip link set vlan1 up
         ip link set macvlan2 up
         ip link set br3 up
         ip link set vlan4 up
         ip link set macvlan5 up
         ip link set br6 up
         ip link set vlan7 up
         ip link set macvlan8 up
         modprobe -rv bridge
      
      Splat looks like:
      [   36.057436][  T744] WARNING: possible recursive locking detected
      [   36.058848][  T744] 5.9.0-rc6+ #728 Not tainted
      [   36.059959][  T744] --------------------------------------------
      [   36.061391][  T744] ip/744 is trying to acquire lock:
      [   36.062590][  T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30
      [   36.064922][  T744]
      [   36.064922][  T744] but task is already holding lock:
      [   36.066626][  T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.068851][  T744]
      [   36.068851][  T744] other info that might help us debug this:
      [   36.070731][  T744]  Possible unsafe locking scenario:
      [   36.070731][  T744]
      [   36.072497][  T744]        CPU0
      [   36.073238][  T744]        ----
      [   36.074007][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.075290][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.076590][  T744]
      [   36.076590][  T744]  *** DEADLOCK ***
      [   36.076590][  T744]
      [   36.078515][  T744]  May be due to missing lock nesting notation
      [   36.078515][  T744]
      [   36.080491][  T744] 3 locks held by ip/744:
      [   36.081471][  T744]  #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490
      [   36.083614][  T744]  #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.085942][  T744]  #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80
      [   36.088400][  T744]
      [   36.088400][  T744] stack backtrace:
      [   36.089772][  T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728
      [   36.091364][  T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   36.093630][  T744] Call Trace:
      [   36.094416][  T744]  dump_stack+0x77/0x9b
      [   36.095385][  T744]  __lock_acquire+0xbc3/0x1f40
      [   36.096522][  T744]  lock_acquire+0xb4/0x3b0
      [   36.097540][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.098657][  T744]  ? rtmsg_ifinfo+0x1f/0x30
      [   36.099711][  T744]  ? __dev_notify_flags+0xa5/0xf0
      [   36.100874][  T744]  ? rtnl_is_locked+0x11/0x20
      [   36.101967][  T744]  ? __dev_set_promiscuity+0x7b/0x1a0
      [   36.103230][  T744]  _raw_spin_lock_bh+0x38/0x70
      [   36.104348][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.105461][  T744]  dev_set_rx_mode+0x19/0x30
      [   36.106532][  T744]  dev_set_promiscuity+0x36/0x50
      [   36.107692][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.108929][  T744]  dev_set_promiscuity+0x1e/0x50
      [   36.110093][  T744]  br_port_set_promisc+0x1f/0x40 [bridge]
      [   36.111415][  T744]  br_manage_promisc+0x8b/0xe0 [bridge]
      [   36.112728][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.113967][  T744]  ? __hw_addr_sync_one+0x23/0x50
      [   36.115135][  T744]  __dev_set_rx_mode+0x68/0x90
      [   36.116249][  T744]  dev_uc_sync+0x70/0x80
      [   36.117244][  T744]  dev_uc_add+0x50/0x60
      [   36.118223][  T744]  macvlan_open+0x18e/0x1f0 [macvlan]
      [   36.119470][  T744]  __dev_open+0xd6/0x170
      [   36.120470][  T744]  __dev_change_flags+0x181/0x1d0
      [   36.121644][  T744]  dev_change_flags+0x23/0x60
      [   36.122741][  T744]  do_setlink+0x30a/0x11e0
      [   36.123778][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.124929][  T744]  ? __nla_validate_parse.part.6+0x45/0x8e0
      [   36.126309][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.127457][  T744]  __rtnl_newlink+0x546/0x8e0
      [   36.128560][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.129623][  T744]  ? deactivate_slab.isra.85+0x6a1/0x850
      [   36.130946][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.132102][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.133176][  T744]  ? is_bpf_text_address+0x5/0xe0
      [   36.134364][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.135445][  T744]  ? rcu_read_lock_sched_held+0x32/0x60
      [   36.136771][  T744]  ? kmem_cache_alloc_trace+0x2d8/0x380
      [   36.138070][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.139164][  T744]  rtnl_newlink+0x47/0x70
      [ ... ]
      
      Fixes: 845e0ebb ("net: change addr_list_lock back to static key")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fc70edb
    • T
      net: core: introduce struct netdev_nested_priv for nested interface infrastructure · eff74233
      Taehee Yoo 提交于
      Functions related to nested interface infrastructure such as
      netdev_walk_all_{ upper | lower }_dev() pass both private functions
      and "data" pointer to handle their own things.
      At this point, the data pointer type is void *.
      In order to make it easier to expand common variables and functions,
      this new netdev_nested_priv structure is added.
      
      In the following patch, a new member variable will be added into this
      struct to fix the lockdep issue.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eff74233
    • T
      net: core: add __netdev_upper_dev_unlink() · fe8300fd
      Taehee Yoo 提交于
      The netdev_upper_dev_unlink() has to work differently according to flags.
      This idea is the same with __netdev_upper_dev_link().
      
      In the following patches, new flags will be added.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe8300fd
  27. 24 9月, 2020 1 次提交
  28. 19 9月, 2020 2 次提交
    • R
      net: core: delete duplicated words · 4250b75b
      Randy Dunlap 提交于
      Drop repeated words in net/core/.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4250b75b
    • F
      net: use exponential backoff in netdev_wait_allrefs · 0e4be9e5
      Francesco Ruggeri 提交于
      The combination of aca_free_rcu, introduced in commit 2384d025
      ("net/ipv6: Add anycast addresses to a global hashtable"), and
      fib6_info_destroy_rcu, introduced in commit 9b0a8da8 ("net/ipv6:
      respect rcu grace period before freeing fib6_info"), can result in
      an extra rcu grace period being needed when deleting an interface,
      with the result that netdev_wait_allrefs ends up hitting the msleep(250),
      which is considerably longer than the required grace period.
      This can result in long delays when deleting a large number of interfaces,
      and it can be observed with this script:
      
      ns=dummy-ns
      NIFS=100
      
      ip netns add $ns
      ip netns exec $ns ip link set lo up
      ip netns exec $ns sysctl net.ipv6.conf.default.disable_ipv6=0
      ip netns exec $ns sysctl net.ipv6.conf.default.forwarding=1
      
      for ((i=0; i<$NIFS; i++))
      do
              if=eth$i
              ip netns exec $ns ip link add $if type dummy
              ip netns exec $ns ip link set $if up
              ip netns exec $ns ip -6 addr add 2021:$i::1/120 dev $if
      done
      
      for ((i=0; i<$NIFS; i++))
      do
              if=eth$i
              ip netns exec $ns ip link del $if
      done
      
      ip netns del $ns
      
      Instead of using a fixed msleep(250), this patch tries an extra
      rcu_barrier() followed by an exponential backoff.
      
      Time with this patch on a 5.4 kernel:
      
      real	0m7.704s
      user	0m0.385s
      sys	0m1.230s
      
      Time without this patch:
      
      real    0m31.522s
      user    0m0.438s
      sys     0m1.156s
      
      v2: use exponential backoff instead of trying to wake up
          netdev_wait_allrefs.
      v3: preserve reverse christmas tree ordering of local variables
      v4: try an extra rcu_barrier before the backoff, plus some
          cosmetic changes.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e4be9e5
  29. 16 9月, 2020 1 次提交