1. 11 8月, 2016 1 次提交
  2. 09 8月, 2016 2 次提交
    • M
      net/sched/sch_hfsc.c: remove unused cl_myfadj · 37088f61
      Michal Soltys 提交于
      The code using this variable has been commented out in the past as it
      was causing issues in upperlimited link-sharing scenarios.
      Signed-off-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37088f61
    • M
      net/sched/sch_hfsc.c: keep fsc and virtual times in sync; fix an old bug · 678a6241
      Michal Soltys 提交于
      This patch simplifies how we update fsc and calculate vt from it - while
      keeping the expected functionality identical with how hfsc behaves
      curently. It also fixes a certain issue introduced with
      a very old patch.
      
      The idea is, that instead of correcting cl_vt before fsc curve update
      (rtsc_min) and correcting cl_vt after calculation (rtsc_y2x) to keep
      cl_vt local to the current period - we can simply rely on virtual times
      and curve values always being in sync - analogously to how rsc and usc
      function, except that we use virtual time here.
      
      Why hasn't it been done since the beginning this way ? The likely scenario
      (basing on the code trying to correct curves whenever possible) was to
      keep the virtual times as small as possible - as they have tendency to
      "gallop" forward whenever their siblings and other fair sharing
      subtrees are idling. On top of that, current code is subtly bugged, so
      cumulative time (without any corrections) is always kept and used in
      init_vf() when a new backlog period begins (using cl_cvtoff).
      
      Is cumulative value safe ? Generally yes, though corner cases are easy
      to create. For example consider:
      
      1gbit interface
      some 100kbit leaf, everything else idle
      
      With current tick (64ns) 1s is 15625000 ticks, but the leaf is alone and
      it's virtual time, so in reality it's 10000 times more. ITOW 38 bits are
      needed to hold 1 second. 54 - 1 day, 59 - 1 month, 63 - 1 year (all
      logarithms rounded up). It's getting somewhat dangerous, but also
      requires setup excusing this kind of values not mentioning permanently
      backlogged class for a year. In near most extreme case (10gbit, 10kbit
      leaf), we have "enough" to hold ~13.6 days in 64 bits.
      
      Well, the issue remains mostly theoretical and cl_cvtoff has been
      working fine for all those years. Sensible configuration are de-facto
      immune to this issue, and not so sensible can solve it with a cronjob
      and its period inversely proportional to the insanity of such setup =)
      
      Now let's explain the subtle bug mentioned earlier.
      
      The issue is related to how offsets are kept and how we calculate
      virtual times and update fair service curve(s). The issue itself is
      subtle, but easy to observe with long m1 segments. It was introduced in
      rather old patch:
      
      Commit 99296150c7: "[NET_SCHED]: O(1) children vtoff adjustment
      in HFSC scheduler"
      
      (available in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git)
      
      Originally when a new backlog period was started, cl_vtoff of each
      sibling was updated with cl_cvtmax from past period - naturally moving
      all cl_vt to proper starting point. That patch adjusted it so cumulative
      offset is kept in the parent, and there is no need for traversing the
      list (as any subsequent child activation derives new vt from already
      active sibling(s)).
      
      But with this change, cl_vtoff (of each sibling) is no longer persistent
      across the inactivity periods, as it's calculated from parent's
      cl_cvtoff on a new backlog period, conflicting with the following curve
      correction from the previous period:
      
      if (cl->cl_virtual.x == vt) {
              cl->cl_virtual.x -= cl->cl_vtoff;
      	cl->cl_vtoff = 0;
      }
      
      This essentially tries to keep curve as if it was local to the period
      and resets cl_vtoff (cumulative vt offset of the class) to 0 when
      possible (read: when we have an intersection or if a new curve is below
      the old one). But then it's recalculated from cl_cvtoff on next active
      period.  Then rtsc_min() call preceding the above if() doesn't really
      do what we expect it to do in such scenario - as it calculates the
      minimum of corrected curve (from the previous backlog period) and the
      new uncorrected curve (with offset derived from cl_cvtoff).
      
      Example:
      
      tc class add dev $ife parent 1:0 classid 1:1  hfsc ls m2 100mbit ul m2 100mbit
      tc class add dev $ife parent 1:1 classid 1:10 hfsc ls m1 80mbit d 10s m2 20mbit
      tc class add dev $ife parent 1:1 classid 1:11 hfsc ls m2 20mbit
      
      start B, keep it backlogged, let it run 6s (30s worth of vt as A is idle)
      pause B briefly to force cl_cvtoff update in parent (whole 1:1 going idle)
      start A, let it run 10s
      pause A briefly to force rtsc_min()
      
      At this point we would expect A to continue at 20mbit after a brief
      moment of 80mbit. But instead A will use 80mbit for full 10s again. It's
      the effect of first correcting A (during 'start A'), and then - after
      unpausing - calculating rtsc_min() from old corrected and new uncorrected
      curve.
      
      The patch fixes this bug and keepis vt and fsc in sync (virtual times
      are cumulative, not local to the backlog period).
      Signed-off-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      678a6241
  3. 26 7月, 2016 2 次提交
    • W
      net_sched: get rid of struct tcf_common · ec0595cc
      WANG Cong 提交于
      After the previous patch, struct tc_action should be enough
      to represent the generic tc action, tcf_common is not necessary
      any more. This patch gets rid of it to make tc action code
      more readable.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0595cc
    • W
      net_sched: move tc_action into tcf_common · a85a970a
      WANG Cong 提交于
      struct tc_action is confusing, currently we use it for two purposes:
      1) Pass in arguments and carry out results from helper functions
      2) A generic representation for tc actions
      
      The first one is error-prone, since we need to make sure we don't
      miss anything. This patch aims to get rid of this use, by moving
      tc_action into tcf_common, so that they are allocated together
      in hashtable and can be cast'ed easily.
      
      And together with the following patch, we could really make
      tc_action a generic representation for all tc actions and each
      type of action can inherit from it.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a85a970a
  4. 25 7月, 2016 2 次提交
  5. 19 7月, 2016 1 次提交
  6. 09 7月, 2016 1 次提交
    • F
      hfsc: reduce hfsc_sched to 14 cachelines · bba7eb5d
      Florian Westphal 提交于
      hfsc_sched is huge (size: 920, cachelines: 15), but we can get it to 14
      cachelines by placing level after filter_cnt (covering 4 byte hole) and
      reducing period/nactive/flags to u32 (period is just a counter,
      incremented when class becomes active -- 2**32 is plenty for this
      purpose, also, long is only 32bit wide on 32bit platforms anyway).
      
      cl_vtperiod is exported to userspace via tc_hfsc_stats, but its period
      member is already u32, so no precision is lost there either.
      
      Cc: Michal Soltys <soltys@ziu.info>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bba7eb5d
  7. 05 7月, 2016 2 次提交
  8. 02 7月, 2016 2 次提交
  9. 01 7月, 2016 5 次提交
  10. 29 6月, 2016 1 次提交
  11. 26 6月, 2016 4 次提交
    • E
      net_sched: generalize bulk dequeue · 4d202a0d
      Eric Dumazet 提交于
      When qdisc bulk dequeue was added in linux-3.18 (commit
      5772e9a3 "qdisc: bulk dequeue support for qdiscs
      with TCQ_F_ONETXQUEUE"), it was constrained to some
      specific qdiscs.
      
      With some extra care, we can extend this to all qdiscs,
      so that typical traffic shaping solutions can benefit from
      small batches (8 packets in this patch).
      
      For example, HTB is often used on some multi queue device.
      And bonding/team are multi queue devices...
      
      Idea is to bulk-dequeue packets mapping to the same transmit queue.
      
      This brings between 35 and 80 % performance increase in HTB setup
      under pressure on a bonding setup :
      
      1) NUMA node contention :   610,000 pps -> 1,110,000 pps
      2) No node contention   : 1,380,000 pps -> 1,930,000 pps
      
      Now we should work to add batches on the enqueue() side ;)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d202a0d
    • E
      net_sched: sch_htb: export class backlog in dumps · 338ed9b4
      Eric Dumazet 提交于
      We already get child qdisc qlen, we also can get its backlog
      so that class dumps can report it.
      
      Also replace qstats by a single drop counter, but move it in
      a separate cache line so that drops do not dirty useful cache lines.
      
      Tested:
      
      $ tc -s cl sh dev eth0
      class htb 1:1 root leaf 3: prio 0 rate 1Gbit ceil 1Gbit burst 500000b cburst 500000b
       Sent 2183346912 bytes 9021815 pkt (dropped 2340774, overlimits 0 requeues 0)
       rate 1001Mbit 517543pps backlog 120758b 499p requeues 0
       lended: 9021770 borrowed: 0 giants: 0
       tokens: 9 ctokens: 9
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      338ed9b4
    • E
      net_sched: fq_codel: cache skb->truesize into skb->cb · 008830bc
      Eric Dumazet 提交于
      Now we defer skb drops, it makes sense to keep a copy
      of skb->truesize in struct codel_skb_cb to avoid one
      cache line miss per dropped skb in fq_codel_drop(),
      to reduce latencies a bit further.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      008830bc
    • E
      net_sched: drop packets after root qdisc lock is released · 520ac30f
      Eric Dumazet 提交于
      Qdisc performance suffers when packets are dropped at enqueue()
      time because drops (kfree_skb()) are done while qdisc lock is held,
      delaying a dequeue() draining the queue.
      
      Nominal throughput can be reduced by 50 % when this happens,
      at a time we would like the dequeue() to proceed as fast as possible.
      
      Even FQ is vulnerable to this problem, while one of FQ goals was
      to provide some flow isolation.
      
      This patch adds a 'struct sk_buff **to_free' parameter to all
      qdisc->enqueue(), and in qdisc_drop() helper.
      
      I measured a performance increase of up to 12 %, but this patch
      is a prereq so that future batches in enqueue() can fly.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      520ac30f
  12. 24 6月, 2016 3 次提交
  13. 17 6月, 2016 1 次提交
  14. 16 6月, 2016 13 次提交