1. 23 10月, 2018 1 次提交
  2. 25 9月, 2018 1 次提交
  3. 01 9月, 2018 1 次提交
  4. 22 8月, 2018 1 次提交
  5. 20 8月, 2018 1 次提交
    • V
      net: sched: always disable bh when taking tcf_lock · 653cd284
      Vlad Buslov 提交于
      Recently, ops->init() and ops->dump() of all actions were modified to
      always obtain tcf_lock when accessing private action state. Actions that
      don't depend on tcf_lock for synchronization with their data path use
      non-bh locking API. However, tcf_lock is also used to protect rate
      estimator stats in softirq context by timer callback.
      
      Change ops->init() and ops->dump() of all actions to disable bh when using
      tcf_lock to prevent deadlock reported by following lockdep warning:
      
      [  105.470398] ================================
      [  105.475014] WARNING: inconsistent lock state
      [  105.479628] 4.18.0-rc8+ #664 Not tainted
      [  105.483897] --------------------------------
      [  105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [  105.500449] 00000000f86c012e (&(&p->tcfa_lock)->rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
      [  105.509696] {SOFTIRQ-ON-W} state was registered at:
      [  105.514925]   _raw_spin_lock+0x2c/0x40
      [  105.519022]   tcf_bpf_init+0x579/0x820 [act_bpf]
      [  105.523990]   tcf_action_init_1+0x4e4/0x660
      [  105.528518]   tcf_action_init+0x1ce/0x2d0
      [  105.532880]   tcf_exts_validate+0x1d8/0x200
      [  105.537416]   fl_change+0x55a/0x268b [cls_flower]
      [  105.542469]   tc_new_tfilter+0x748/0xa20
      [  105.546738]   rtnetlink_rcv_msg+0x56a/0x6d0
      [  105.551268]   netlink_rcv_skb+0x18d/0x200
      [  105.555628]   netlink_unicast+0x2d0/0x370
      [  105.559990]   netlink_sendmsg+0x3b9/0x6a0
      [  105.564349]   sock_sendmsg+0x6b/0x80
      [  105.568271]   ___sys_sendmsg+0x4a1/0x520
      [  105.572547]   __sys_sendmsg+0xd7/0x150
      [  105.576655]   do_syscall_64+0x72/0x2c0
      [  105.580757]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  105.586243] irq event stamp: 489296
      [  105.590084] hardirqs last  enabled at (489296): [<ffffffffb507e639>] _raw_spin_unlock_irq+0x29/0x40
      [  105.599765] hardirqs last disabled at (489295): [<ffffffffb507e745>] _raw_spin_lock_irq+0x15/0x50
      [  105.609277] softirqs last  enabled at (489292): [<ffffffffb413a6a3>] irq_enter+0x83/0xa0
      [  105.618001] softirqs last disabled at (489293): [<ffffffffb413a800>] irq_exit+0x140/0x190
      [  105.626813]
                     other info that might help us debug this:
      [  105.633976]  Possible unsafe locking scenario:
      
      [  105.640526]        CPU0
      [  105.643325]        ----
      [  105.646125]   lock(&(&p->tcfa_lock)->rlock);
      [  105.650747]   <Interrupt>
      [  105.653717]     lock(&(&p->tcfa_lock)->rlock);
      [  105.658514]
                      *** DEADLOCK ***
      
      [  105.665349] 1 lock held by swapper/16/0:
      [  105.669629]  #0: 00000000a640ad99 ((&est->timer)){+.-.}, at: call_timer_fn+0x10b/0x550
      [  105.678200]
                     stack backtrace:
      [  105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
      [  105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  105.698626] Call Trace:
      [  105.701421]  <IRQ>
      [  105.703791]  dump_stack+0x92/0xeb
      [  105.707461]  print_usage_bug+0x336/0x34c
      [  105.711744]  mark_lock+0x7c9/0x980
      [  105.715500]  ? print_shortest_lock_dependencies+0x2e0/0x2e0
      [  105.721424]  ? check_usage_forwards+0x230/0x230
      [  105.726315]  __lock_acquire+0x923/0x26f0
      [  105.730597]  ? debug_show_all_locks+0x240/0x240
      [  105.735478]  ? mark_lock+0x493/0x980
      [  105.739412]  ? check_chain_key+0x140/0x1f0
      [  105.743861]  ? __lock_acquire+0x836/0x26f0
      [  105.748323]  ? lock_acquire+0x12e/0x290
      [  105.752516]  lock_acquire+0x12e/0x290
      [  105.756539]  ? est_fetch_counters+0x3c/0xa0
      [  105.761084]  _raw_spin_lock+0x2c/0x40
      [  105.765099]  ? est_fetch_counters+0x3c/0xa0
      [  105.769633]  est_fetch_counters+0x3c/0xa0
      [  105.773995]  est_timer+0x87/0x390
      [  105.777670]  ? est_fetch_counters+0xa0/0xa0
      [  105.782210]  ? lock_acquire+0x12e/0x290
      [  105.786410]  call_timer_fn+0x161/0x550
      [  105.790512]  ? est_fetch_counters+0xa0/0xa0
      [  105.795055]  ? del_timer_sync+0xd0/0xd0
      [  105.799249]  ? __lock_is_held+0x93/0x110
      [  105.803531]  ? mark_held_locks+0x20/0xe0
      [  105.807813]  ? _raw_spin_unlock_irq+0x29/0x40
      [  105.812525]  ? est_fetch_counters+0xa0/0xa0
      [  105.817069]  ? est_fetch_counters+0xa0/0xa0
      [  105.821610]  run_timer_softirq+0x3c4/0x9f0
      [  105.826064]  ? lock_acquire+0x12e/0x290
      [  105.830257]  ? __bpf_trace_timer_class+0x10/0x10
      [  105.835237]  ? __lock_is_held+0x25/0x110
      [  105.839517]  __do_softirq+0x11d/0x7bf
      [  105.843542]  irq_exit+0x140/0x190
      [  105.847208]  smp_apic_timer_interrupt+0xac/0x3b0
      [  105.852182]  apic_timer_interrupt+0xf/0x20
      [  105.856628]  </IRQ>
      [  105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
      [  105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 <4c> 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
      [  105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      [  105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
      [  105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
      [  105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
      [  105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
      [  105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
      [  105.929988]  ? trace_hardirqs_on_caller+0x141/0x2d0
      [  105.935232]  do_idle+0x28e/0x320
      [  105.938817]  ? arch_cpu_idle_exit+0x40/0x40
      [  105.943361]  ? mark_lock+0x8c1/0x980
      [  105.947295]  ? _raw_spin_unlock_irqrestore+0x32/0x60
      [  105.952619]  cpu_startup_entry+0xc2/0xd0
      [  105.956900]  ? cpu_in_idle+0x20/0x20
      [  105.960830]  ? _raw_spin_unlock_irqrestore+0x32/0x60
      [  105.966146]  ? trace_hardirqs_on_caller+0x141/0x2d0
      [  105.971391]  start_secondary+0x2b5/0x360
      [  105.975669]  ? set_cpu_sibling_map+0x1330/0x1330
      [  105.980654]  secondary_startup_64+0xa5/0xb0
      
      Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
      warning regarding possible irq lock inversion dependency between tcf_lock,
      and psample_groups_lock that is taken when holding tcf_lock in sample init:
      
      [  162.108959]  Possible interrupt unsafe locking scenario:
      
      [  162.116386]        CPU0                    CPU1
      [  162.121277]        ----                    ----
      [  162.126162]   lock(psample_groups_lock);
      [  162.130447]                                local_irq_disable();
      [  162.136772]                                lock(&(&p->tcfa_lock)->rlock);
      [  162.143957]                                lock(psample_groups_lock);
      [  162.150813]   <Interrupt>
      [  162.153808]     lock(&(&p->tcfa_lock)->rlock);
      [  162.158608]
                      *** DEADLOCK ***
      
      In order to prevent potential lock inversion dependency between tcf_lock
      and psample_groups_lock, extract call to psample_group_get() from tcf_lock
      protected section in sample action init function.
      
      Fixes: 4e232818 ("net: sched: act_mirred: remove dependency on rtnl lock")
      Fixes: 764e9a24 ("net: sched: act_vlan: remove dependency on rtnl lock")
      Fixes: 729e0126 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
      Fixes: d7728495 ("net: sched: act_sample: remove dependency on rtnl lock")
      Fixes: e8917f43 ("net: sched: act_gact: remove dependency on rtnl lock")
      Fixes: b6a2b971 ("net: sched: act_csum: remove dependency on rtnl lock")
      Fixes: 2142236b ("net: sched: act_bpf: remove dependency on rtnl lock")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      653cd284
  6. 14 8月, 2018 1 次提交
  7. 12 8月, 2018 1 次提交
  8. 08 7月, 2018 5 次提交
  9. 28 3月, 2018 1 次提交
  10. 10 3月, 2018 1 次提交
  11. 28 2月, 2018 1 次提交
    • K
      net: Convert tc_action_net_init() and tc_action_net_exit() based pernet_operations · 685ecfb1
      Kirill Tkhai 提交于
      These pernet_operations are from net/sched directory, and they call only
      tc_action_net_init() and tc_action_net_exit():
      
      bpf_net_ops
      connmark_net_ops
      csum_net_ops
      gact_net_ops
      ife_net_ops
      ipt_net_ops
      xt_net_ops
      mirred_net_ops
      nat_net_ops
      pedit_net_ops
      police_net_ops
      sample_net_ops
      simp_net_ops
      skbedit_net_ops
      skbmod_net_ops
      tunnel_key_net_ops
      vlan_net_ops
      
      1)tc_action_net_init() just allocates and initializes per-net memory.
      2)There should not be in-flight packets at the time of tc_action_net_exit()
      call, or another pernet_operations send packets to dying net (except
      netlink). So, it seems they can be marked as async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      685ecfb1
  12. 17 2月, 2018 4 次提交
  13. 03 1月, 2018 1 次提交
  14. 14 12月, 2017 1 次提交
  15. 09 11月, 2017 1 次提交
  16. 03 11月, 2017 1 次提交
  17. 31 8月, 2017 1 次提交
  18. 14 4月, 2017 1 次提交
  19. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  20. 20 9月, 2016 1 次提交
  21. 26 7月, 2016 1 次提交
    • W
      net_sched: move tc_action into tcf_common · a85a970a
      WANG Cong 提交于
      struct tc_action is confusing, currently we use it for two purposes:
      1) Pass in arguments and carry out results from helper functions
      2) A generic representation for tc actions
      
      The first one is error-prone, since we need to make sure we don't
      miss anything. This patch aims to get rid of this use, by moving
      tc_action into tcf_common, so that they are allocated together
      in hashtable and can be cast'ed easily.
      
      And together with the following patch, we could really make
      tc_action a generic representation for all tc actions and each
      type of action can inherit from it.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a85a970a
  22. 08 6月, 2016 3 次提交
  23. 17 5月, 2016 1 次提交
  24. 27 4月, 2016 1 次提交
  25. 26 2月, 2016 1 次提交
  26. 09 7月, 2015 5 次提交
  27. 07 11月, 2014 1 次提交