1. 11 2月, 2019 1 次提交
  2. 01 12月, 2018 1 次提交
    • D
      net/sched: act_police: fix memory leak in case of invalid control action · fd6d4338
      Davide Caratti 提交于
      when users set an invalid control action, kmemleak complains as follows:
      
       # echo clear >/sys/kernel/debug/kmemleak
       # ./tdc.py -e b48b
       Test b48b: Add police action with exceed goto chain control action
       All test results:
      
       1..1
       ok 1 - b48b # Add police action with exceed goto chain control action
       about to flush the tap output if tests need to be skipped
       done flushing skipped test tap output
       # echo scan >/sys/kernel/debug/kmemleak
       # cat /sys/kernel/debug/kmemleak
       unreferenced object 0xffffa0fafbc3dde0 (size 96):
        comm "tc", pid 2358, jiffies 4294922738 (age 17.022s)
        hex dump (first 32 bytes):
          2a 00 00 20 00 00 00 00 00 00 7d 00 00 00 00 00  *.. ......}.....
          f8 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000648803d2>] tcf_action_init_1+0x384/0x4c0
          [<00000000cb69382e>] tcf_action_init+0x12b/0x1a0
          [<00000000847ef0d4>] tcf_action_add+0x73/0x170
          [<0000000093656e14>] tc_ctl_action+0x122/0x160
          [<0000000023c98e32>] rtnetlink_rcv_msg+0x263/0x2d0
          [<000000003493ae9c>] netlink_rcv_skb+0x4d/0x130
          [<00000000de63f8ba>] netlink_unicast+0x209/0x2d0
          [<00000000c3da0ebe>] netlink_sendmsg+0x2c1/0x3c0
          [<000000007a9e0753>] sock_sendmsg+0x33/0x40
          [<00000000457c6d2e>] ___sys_sendmsg+0x2a0/0x2f0
          [<00000000c5c6a086>] __sys_sendmsg+0x5e/0xa0
          [<00000000446eafce>] do_syscall_64+0x5b/0x180
          [<000000004aa871f2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
          [<00000000450c38ef>] 0xffffffffffffffff
      
      change tcf_police_init() to avoid leaking 'new' in case TCA_POLICE_RESULT
      contains TC_ACT_GOTO_CHAIN extended action.
      
      Fixes: c08f5ed5 ("net/sched: act_police: disallow 'goto chain' on fallback control action")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd6d4338
  3. 24 11月, 2018 1 次提交
    • D
      net/sched: act_police: add missing spinlock initialization · 484afd1b
      Davide Caratti 提交于
      commit f2cbd485 ("net/sched: act_police: fix race condition on state
      variables") introduces a new spinlock, but forgets its initialization.
      Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police'
      action is newly created, to avoid the following lockdep splat:
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       <...>
       Call Trace:
        dump_stack+0x85/0xcb
        register_lock_class+0x581/0x590
        __lock_acquire+0xd4/0x1330
        ? tcf_police_init+0x2fa/0x650 [act_police]
        ? lock_acquire+0x9e/0x1a0
        lock_acquire+0x9e/0x1a0
        ? tcf_police_init+0x2fa/0x650 [act_police]
        ? tcf_police_init+0x55a/0x650 [act_police]
        _raw_spin_lock_bh+0x34/0x40
        ? tcf_police_init+0x2fa/0x650 [act_police]
        tcf_police_init+0x2fa/0x650 [act_police]
        tcf_action_init_1+0x384/0x4c0
        tcf_action_init+0xf6/0x160
        tcf_action_add+0x73/0x170
        tc_ctl_action+0x122/0x160
        rtnetlink_rcv_msg+0x2a4/0x490
        ? netlink_deliver_tap+0x99/0x400
        ? validate_linkmsg+0x370/0x370
        netlink_rcv_skb+0x4d/0x130
        netlink_unicast+0x196/0x230
        netlink_sendmsg+0x2e5/0x3e0
        sock_sendmsg+0x36/0x40
        ___sys_sendmsg+0x280/0x2f0
        ? _raw_spin_unlock+0x24/0x30
        ? handle_pte_fault+0xafe/0xf30
        ? find_held_lock+0x2d/0x90
        ? syscall_trace_enter+0x1df/0x360
        ? __sys_sendmsg+0x5e/0xa0
        __sys_sendmsg+0x5e/0xa0
        do_syscall_64+0x60/0x210
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f1841c7cf10
       Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
       RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10
       RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003
       RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc
       R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0
       R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000
      
      Fixes: f2cbd485 ("net/sched: act_police: fix race condition on state variables")
      Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      484afd1b
  4. 21 11月, 2018 1 次提交
    • D
      net/sched: act_police: fix race condition on state variables · f2cbd485
      Davide Caratti 提交于
      after 'police' configuration parameters were converted to use RCU instead
      of spinlock, the state variables used to compute the traffic rate (namely
      'tcfp_toks', 'tcfp_ptoks' and 'tcfp_t_c') are erroneously read/updated in
      the traffic path without any protection.
      
      Use a dedicated spinlock to avoid race conditions on these variables, and
      ensure proper cache-line alignment. In this way, 'police' is still faster
      than what we observed when 'tcf_lock' was used in the traffic path _ i.e.
      reverting commit 2d550dba ("net/sched: act_police: don't use spinlock
      in the data path"). Moreover, we preserve the throughput improvement that
      was obtained after 'police' started using per-cpu counters, when 'avrate'
      is used instead of 'rate'.
      
      Changes since v1 (thanks to Eric Dumazet):
      - call ktime_get_ns() before acquiring the lock in the traffic path
      - use a dedicated spinlock instead of tcf_lock
      - improve cache-line usage
      
      Fixes: 2d550dba ("net/sched: act_police: don't use spinlock in the data path")
      Reported-and-suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      f2cbd485
  5. 23 10月, 2018 1 次提交
  6. 17 9月, 2018 2 次提交
  7. 01 9月, 2018 1 次提交
  8. 22 8月, 2018 1 次提交
  9. 14 8月, 2018 1 次提交
  10. 12 8月, 2018 1 次提交
  11. 08 7月, 2018 5 次提交
  12. 28 3月, 2018 1 次提交
  13. 22 3月, 2018 1 次提交
    • D
      net/sched: fix idr leak in the error path of tcf_act_police_init() · 5bf7f818
      Davide Caratti 提交于
      tcf_act_police_init() can fail after the idr has been successfully
      reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
      subsequent attempts to configure a police rule using the same idr value
      systematiclly fail with -ENOSPC:
      
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: No space left on device
       ...
      
      Fix this in the error path of tcf_act_police_init(), calling
      tcf_idr_release() in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bf7f818
  14. 28 2月, 2018 1 次提交
    • K
      net: Convert tc_action_net_init() and tc_action_net_exit() based pernet_operations · 685ecfb1
      Kirill Tkhai 提交于
      These pernet_operations are from net/sched directory, and they call only
      tc_action_net_init() and tc_action_net_exit():
      
      bpf_net_ops
      connmark_net_ops
      csum_net_ops
      gact_net_ops
      ife_net_ops
      ipt_net_ops
      xt_net_ops
      mirred_net_ops
      nat_net_ops
      pedit_net_ops
      police_net_ops
      sample_net_ops
      simp_net_ops
      skbedit_net_ops
      skbmod_net_ops
      tunnel_key_net_ops
      vlan_net_ops
      
      1)tc_action_net_init() just allocates and initializes per-net memory.
      2)There should not be in-flight packets at the time of tc_action_net_exit()
      call, or another pernet_operations send packets to dying net (except
      netlink). So, it seems they can be marked as async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      685ecfb1
  15. 17 2月, 2018 4 次提交
  16. 22 12月, 2017 1 次提交
  17. 14 12月, 2017 1 次提交
  18. 09 11月, 2017 1 次提交
  19. 03 11月, 2017 1 次提交
  20. 31 8月, 2017 1 次提交
  21. 15 6月, 2017 1 次提交
  22. 14 4月, 2017 1 次提交
  23. 06 12月, 2016 1 次提交
    • E
      net_sched: gen_estimator: complete rewrite of rate estimators · 1c0d32fd
      Eric Dumazet 提交于
      1) Old code was hard to maintain, due to complex lock chains.
         (We probably will be able to remove some kfree_rcu() in callers)
      
      2) Using a single timer to update all estimators does not scale.
      
      3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
         is not supposed to work well)
      
      In this rewrite :
      
      - I removed the RB tree that had to be scanned in
        gen_estimator_active(). qdisc dumps should be much faster.
      
      - Each estimator has its own timer.
      
      - Estimations are maintained in net_rate_estimator structure,
        instead of dirtying the qdisc. Minor, but part of the simplification.
      
      - Reading the estimator uses RCU and a seqcount to provide proper
        support for 32bit kernels.
      
      - We reduce memory need when estimators are not used, since
        we store a pointer, instead of the bytes/packets counters.
      
      - xt_rateest_mt() no longer has to grab a spinlock.
        (In the future, xt_rateest_tg() could be switched to per cpu counters)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c0d32fd
  24. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  25. 20 9月, 2016 2 次提交
  26. 18 8月, 2016 2 次提交
  27. 26 7月, 2016 2 次提交
    • W
      net_sched: get rid of struct tcf_common · ec0595cc
      WANG Cong 提交于
      After the previous patch, struct tc_action should be enough
      to represent the generic tc action, tcf_common is not necessary
      any more. This patch gets rid of it to make tc action code
      more readable.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0595cc
    • W
      net_sched: move tc_action into tcf_common · a85a970a
      WANG Cong 提交于
      struct tc_action is confusing, currently we use it for two purposes:
      1) Pass in arguments and carry out results from helper functions
      2) A generic representation for tc actions
      
      The first one is error-prone, since we need to make sure we don't
      miss anything. This patch aims to get rid of this use, by moving
      tc_action into tcf_common, so that they are allocated together
      in hashtable and can be cast'ed easily.
      
      And together with the following patch, we could really make
      tc_action a generic representation for all tc actions and each
      type of action can inherit from it.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a85a970a
  28. 15 6月, 2016 1 次提交
  29. 08 6月, 2016 1 次提交
    • W
      act_police: fix a crash during removal · a03e6fe5
      WANG Cong 提交于
      The police action is using its own code to initialize tcf hash
      info, which makes us to forgot to initialize a->hinfo correctly.
      Fix this by calling the helper function tcf_hash_create() directly.
      
      This patch fixed the following crash:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
       IP: [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
       PGD d3c34067 PUD d3e18067 PMD 0
       Oops: 0000 [#1] SMP
       CPU: 2 PID: 853 Comm: tc Not tainted 4.6.0+ #87
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff8800d3e28040 ti: ffff8800d3f6c000 task.ti: ffff8800d3f6c000
       RIP: 0010:[<ffffffff810c099f>]  [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
       RSP: 0000:ffff88011b203c80  EFLAGS: 00010002
       RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028
       RBP: ffff88011b203d40 R08: 0000000000000001 R09: 0000000000000000
       R10: ffff88011b203d58 R11: ffff88011b208000 R12: 0000000000000001
       R13: ffff8800d3e28040 R14: 0000000000000028 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000028 CR3: 00000000d4be1000 CR4: 00000000000006e0
       Stack:
        ffff8800d3e289c0 0000000000000046 000000001b203d60 ffffffff00000000
        0000000000000000 ffff880000000000 0000000000000000 ffffffff00000000
        ffffffff8187142c ffff88011b203ce8 ffff88011b203ce8 ffffffff8101dbfc
       Call Trace:
        <IRQ>
        [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
        [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
        [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
        [<ffffffff810a9604>] ? sched_clock_local+0x11/0x78
        [<ffffffff810bf6a1>] ? mark_lock+0x24/0x201
        [<ffffffff810c1dbd>] lock_acquire+0x120/0x1b4
        [<ffffffff810c1dbd>] ? lock_acquire+0x120/0x1b4
        [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
        [<ffffffff81aad89f>] _raw_spin_lock_bh+0x3c/0x72
        [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
        [<ffffffff8187142c>] __tcf_hash_release+0x77/0xd1
        [<ffffffff81871a27>] tcf_action_destroy+0x49/0x7c
        [<ffffffff81870b1c>] tcf_exts_destroy+0x20/0x2d
        [<ffffffff8189273b>] u32_destroy_key+0x1b/0x4d
        [<ffffffff81892788>] u32_delete_key_freepf_rcu+0x1b/0x1d
        [<ffffffff810de3b8>] rcu_process_callbacks+0x610/0x82e
        [<ffffffff8189276d>] ? u32_destroy_key+0x4d/0x4d
        [<ffffffff81ab0bc1>] __do_softirq+0x191/0x3f4
      
      Fixes: ddf97ccd ("net_sched: add network namespace support for tc actions")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a03e6fe5