1. 15 6月, 2017 1 次提交
  2. 14 4月, 2017 1 次提交
  3. 06 12月, 2016 1 次提交
    • E
      net_sched: gen_estimator: complete rewrite of rate estimators · 1c0d32fd
      Eric Dumazet 提交于
      1) Old code was hard to maintain, due to complex lock chains.
         (We probably will be able to remove some kfree_rcu() in callers)
      
      2) Using a single timer to update all estimators does not scale.
      
      3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
         is not supposed to work well)
      
      In this rewrite :
      
      - I removed the RB tree that had to be scanned in
        gen_estimator_active(). qdisc dumps should be much faster.
      
      - Each estimator has its own timer.
      
      - Estimations are maintained in net_rate_estimator structure,
        instead of dirtying the qdisc. Minor, but part of the simplification.
      
      - Reading the estimator uses RCU and a seqcount to provide proper
        support for 32bit kernels.
      
      - We reduce memory need when estimators are not used, since
        we store a pointer, instead of the bytes/packets counters.
      
      - xt_rateest_mt() no longer has to grab a spinlock.
        (In the future, xt_rateest_tg() could be switched to per cpu counters)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c0d32fd
  4. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  5. 20 9月, 2016 2 次提交
  6. 18 8月, 2016 2 次提交
  7. 26 7月, 2016 2 次提交
    • W
      net_sched: get rid of struct tcf_common · ec0595cc
      WANG Cong 提交于
      After the previous patch, struct tc_action should be enough
      to represent the generic tc action, tcf_common is not necessary
      any more. This patch gets rid of it to make tc action code
      more readable.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0595cc
    • W
      net_sched: move tc_action into tcf_common · a85a970a
      WANG Cong 提交于
      struct tc_action is confusing, currently we use it for two purposes:
      1) Pass in arguments and carry out results from helper functions
      2) A generic representation for tc actions
      
      The first one is error-prone, since we need to make sure we don't
      miss anything. This patch aims to get rid of this use, by moving
      tc_action into tcf_common, so that they are allocated together
      in hashtable and can be cast'ed easily.
      
      And together with the following patch, we could really make
      tc_action a generic representation for all tc actions and each
      type of action can inherit from it.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a85a970a
  8. 15 6月, 2016 1 次提交
  9. 08 6月, 2016 3 次提交
    • W
      act_police: fix a crash during removal · a03e6fe5
      WANG Cong 提交于
      The police action is using its own code to initialize tcf hash
      info, which makes us to forgot to initialize a->hinfo correctly.
      Fix this by calling the helper function tcf_hash_create() directly.
      
      This patch fixed the following crash:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
       IP: [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
       PGD d3c34067 PUD d3e18067 PMD 0
       Oops: 0000 [#1] SMP
       CPU: 2 PID: 853 Comm: tc Not tainted 4.6.0+ #87
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff8800d3e28040 ti: ffff8800d3f6c000 task.ti: ffff8800d3f6c000
       RIP: 0010:[<ffffffff810c099f>]  [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
       RSP: 0000:ffff88011b203c80  EFLAGS: 00010002
       RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028
       RBP: ffff88011b203d40 R08: 0000000000000001 R09: 0000000000000000
       R10: ffff88011b203d58 R11: ffff88011b208000 R12: 0000000000000001
       R13: ffff8800d3e28040 R14: 0000000000000028 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000028 CR3: 00000000d4be1000 CR4: 00000000000006e0
       Stack:
        ffff8800d3e289c0 0000000000000046 000000001b203d60 ffffffff00000000
        0000000000000000 ffff880000000000 0000000000000000 ffffffff00000000
        ffffffff8187142c ffff88011b203ce8 ffff88011b203ce8 ffffffff8101dbfc
       Call Trace:
        <IRQ>
        [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
        [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
        [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
        [<ffffffff810a9604>] ? sched_clock_local+0x11/0x78
        [<ffffffff810bf6a1>] ? mark_lock+0x24/0x201
        [<ffffffff810c1dbd>] lock_acquire+0x120/0x1b4
        [<ffffffff810c1dbd>] ? lock_acquire+0x120/0x1b4
        [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
        [<ffffffff81aad89f>] _raw_spin_lock_bh+0x3c/0x72
        [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
        [<ffffffff8187142c>] __tcf_hash_release+0x77/0xd1
        [<ffffffff81871a27>] tcf_action_destroy+0x49/0x7c
        [<ffffffff81870b1c>] tcf_exts_destroy+0x20/0x2d
        [<ffffffff8189273b>] u32_destroy_key+0x1b/0x4d
        [<ffffffff81892788>] u32_delete_key_freepf_rcu+0x1b/0x1d
        [<ffffffff810de3b8>] rcu_process_callbacks+0x610/0x82e
        [<ffffffff8189276d>] ? u32_destroy_key+0x4d/0x4d
        [<ffffffff81ab0bc1>] __do_softirq+0x191/0x3f4
      
      Fixes: ddf97ccd ("net_sched: add network namespace support for tc actions")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a03e6fe5
    • E
      net: sched: do not acquire qdisc spinlock in qdisc/class stats dump · edb09eb1
      Eric Dumazet 提交于
      Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
      agent [1] are problematic at scale :
      
      For each qdisc/class found in the dump, we currently lock the root qdisc
      spinlock in order to get stats. Sampling stats every 5 seconds from
      thousands of HTB classes is a challenge when the root qdisc spinlock is
      under high pressure. Not only the dumps take time, they also slow
      down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.
      
      An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
      that might need the qdisc lock in fq_codel_dump_stats() and
      fq_codel_dump_class_stats()
      
      In v2 of this patch, I now use the Qdisc running seqcount to provide
      consistent reads of packets/bytes counters, regardless of 32/64 bit arches.
      
      I also changed rate estimators to use the same infrastructure
      so that they no longer need to lock root qdisc lock.
      
      [1]
      http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdfSigned-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Kevin Athey <kda@google.com>
      Cc: Xiaotian Pei <xiaotian@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edb09eb1
    • J
      net sched actions: introduce timestamp for firsttime use · 53eb440f
      Jamal Hadi Salim 提交于
      Useful to know when the action was first used for accounting
      (and debugging)
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53eb440f
  10. 25 5月, 2016 1 次提交
  11. 26 2月, 2016 1 次提交
  12. 07 11月, 2014 1 次提交
  13. 30 9月, 2014 1 次提交
    • J
      net: sched: make bstats per cpu and estimator RCU safe · 22e0f8b9
      John Fastabend 提交于
      In order to run qdisc's without locking statistics and estimators
      need to be handled correctly.
      
      To resolve bstats make the statistics per cpu. And because this is
      only needed for qdiscs that are running without locks which is not
      the case for most qdiscs in the near future only create percpu
      stats when qdiscs set the TCQ_F_CPUSTATS flag.
      
      Next because estimators use the bstats to calculate packets per
      second and bytes per second the estimator code paths are updated
      to use the per cpu statistics.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22e0f8b9
  14. 23 8月, 2014 1 次提交
  15. 13 2月, 2014 3 次提交
  16. 22 1月, 2014 2 次提交
  17. 20 1月, 2014 1 次提交
  18. 17 1月, 2014 1 次提交
  19. 14 1月, 2014 1 次提交
  20. 28 12月, 2013 1 次提交
    • J
      net_sched: act: Dont increment refcnt on replace · 1a29321e
      Jamal Hadi Salim 提交于
       This is a bug fix. The existing code tries to kill many
       birds with one stone: Handling binding of actions to
       filters, new actions and replacing of action
       attributes. A simple test case to illustrate:
      
      XXXX
       moja@fe1:~$ sudo tc actions add action drop index 12
       moja@fe1:~$ actions get action gact index 12
       action order 1: gact action drop
        random type none pass val 0
        index 12 ref 1 bind 0
       moja@fe1:~$ sudo tc actions replace action ok index 12
       moja@fe1:~$ actions get action gact index 12
       action order 1: gact action drop
        random type none pass val 0
        index 12 ref 2 bind 0
      XXXX
      
      The above shows the refcounf being wrongly incremented on replace.
      There are more complex scenarios with binding of actions to filters
      that i am leaving out that didnt work as well...
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a29321e
  21. 21 12月, 2013 1 次提交
  22. 20 12月, 2013 1 次提交
  23. 19 12月, 2013 2 次提交
  24. 06 12月, 2013 1 次提交
  25. 21 9月, 2013 1 次提交
  26. 03 6月, 2013 1 次提交
  27. 13 2月, 2013 2 次提交
  28. 15 1月, 2013 1 次提交
  29. 02 4月, 2012 1 次提交
  30. 06 7月, 2011 1 次提交