1. 26 6月, 2016 3 次提交
    • E
      net_sched: generalize bulk dequeue · 4d202a0d
      Eric Dumazet 提交于
      When qdisc bulk dequeue was added in linux-3.18 (commit
      5772e9a3 "qdisc: bulk dequeue support for qdiscs
      with TCQ_F_ONETXQUEUE"), it was constrained to some
      specific qdiscs.
      
      With some extra care, we can extend this to all qdiscs,
      so that typical traffic shaping solutions can benefit from
      small batches (8 packets in this patch).
      
      For example, HTB is often used on some multi queue device.
      And bonding/team are multi queue devices...
      
      Idea is to bulk-dequeue packets mapping to the same transmit queue.
      
      This brings between 35 and 80 % performance increase in HTB setup
      under pressure on a bonding setup :
      
      1) NUMA node contention :   610,000 pps -> 1,110,000 pps
      2) No node contention   : 1,380,000 pps -> 1,930,000 pps
      
      Now we should work to add batches on the enqueue() side ;)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d202a0d
    • E
      net_sched: fq_codel: cache skb->truesize into skb->cb · 008830bc
      Eric Dumazet 提交于
      Now we defer skb drops, it makes sense to keep a copy
      of skb->truesize in struct codel_skb_cb to avoid one
      cache line miss per dropped skb in fq_codel_drop(),
      to reduce latencies a bit further.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      008830bc
    • E
      net_sched: drop packets after root qdisc lock is released · 520ac30f
      Eric Dumazet 提交于
      Qdisc performance suffers when packets are dropped at enqueue()
      time because drops (kfree_skb()) are done while qdisc lock is held,
      delaying a dequeue() draining the queue.
      
      Nominal throughput can be reduced by 50 % when this happens,
      at a time we would like the dequeue() to proceed as fast as possible.
      
      Even FQ is vulnerable to this problem, while one of FQ goals was
      to provide some flow isolation.
      
      This patch adds a 'struct sk_buff **to_free' parameter to all
      qdisc->enqueue(), and in qdisc_drop() helper.
      
      I measured a performance increase of up to 12 %, but this patch
      is a prereq so that future batches in enqueue() can fly.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      520ac30f
  2. 24 6月, 2016 1 次提交
  3. 19 6月, 2016 1 次提交
  4. 18 6月, 2016 8 次提交
  5. 16 6月, 2016 13 次提交
  6. 15 6月, 2016 1 次提交
    • P
      netfilter: nf_tables: reject loops from set element jump to chain · 8588ac09
      Pablo Neira Ayuso 提交于
      Liping Zhang says:
      
      "Users may add such a wrong nft rules successfully, which will cause an
      endless jump loop:
      
        # nft add rule filter test tcp dport vmap {1: jump test}
      
      This is because before we commit, the element in the current anonymous
      set is inactive, so osp->walk will skip this element and miss the
      validate check."
      
      To resolve this problem, this patch passes the generation mask to the
      walk function through the iter container structure depending on the code
      path:
      
      1) If we're dumping the elements, then we have to check if the element
         is active in the current generation. Thus, we check for the current
         bit in the genmask.
      
      2) If we're checking for loops, then we have to check if the element is
         active in the next generation, as we're in the middle of a
         transaction. Thus, we check for the next bit in the genmask.
      
      Based on original patch from Liping Zhang.
      Reported-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Tested-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      8588ac09
  7. 11 6月, 2016 2 次提交
  8. 10 6月, 2016 2 次提交
  9. 09 6月, 2016 8 次提交
    • M
      mac80211: implement codel on fair queuing flows · 5caa328e
      Michal Kazior 提交于
      There is no other limit other than a global
      packet count limit when using software queuing.
      This means a single flow queue can grow insanely
      long. This is particularly bad for TCP congestion
      algorithms which requires a little more
      sophisticated frame dropping scheme than a mere
      headdrop on limit overflow.
      
      Hence apply (a slighly modified, to fit the knobs)
      CoDel5 on flow queues. This improves TCP
      convergence and stability when combined with
      wireless driver which keeps its own tx queue/fifo
      at a minimum fill level for given link conditions.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      5caa328e
    • M
      mac80211: skip netdev queue control with software queuing · 80a83cfc
      Michal Kazior 提交于
      Qdiscs are designed with no regard to 802.11
      aggregation requirements and hand out
      packet-by-packet with no guarantee they are
      destined to the same tid. This does more bad than
      good no matter how fairly a given qdisc may behave
      on an ethernet interface.
      
      Software queuing used per-AC netdev subqueue
      congestion control whenever a global AC limit was
      hit. This meant in practice a single station or
      tid queue could starve others rather easily. This
      could resonate with qdiscs in a bad way or could
      just end up with poor aggregation performance.
      Increasing the AC limit would increase induced
      latency which is also bad.
      
      Disabling qdiscs by default and performing
      taildrop instead of netdev subqueue congestion
      control on the other hand makes it possible for
      tid queues to fill up "in the meantime" while
      preventing stations starving each other.
      
      This increases aggregation opportunities and
      should allow software queuing based drivers
      achieve better performance by utilizing airtime
      more efficiently with big aggregates.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      80a83cfc
    • F
      sched: place state, next_sched and gso_skb in same cacheline again · c8945043
      Florian Westphal 提交于
      Earlier commits removed two members from struct Qdisc which places
      next_sched/gso_skb into a different cacheline than ->state.
      
      This restores the struct layout to what it was before the removal.
      Move the two members, then add an annotation so they all reside in the
      same cacheline.
      
      This adds a 16 byte hole after cpu_qstats.
      
      The hole could be closed but as it doesn't decrease total struct size just
      do it this way.
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8945043
    • F
      sched: remove qdisc->drop · a09ceb0e
      Florian Westphal 提交于
      after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
      more callers of ->drop() outside of other ->drop functions, i.e.
      nothing calls them.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a09ceb0e
    • F
      sched: remove qdisc_rehape_fail · c3a173d7
      Florian Westphal 提交于
      After the removal of TCA_CBQ_POLICE in cbq scheduler qdisc->reshape_fail
      is always NULL, i.e. qdisc_rehape_fail is now the same as qdisc_drop.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3a173d7
    • F
      cbq: remove TCA_CBQ_POLICE support · dd47c1fa
      Florian Westphal 提交于
      iproute2 doesn't implement any cbq option that results in this attribute
      being sent to kernel.
      
      To make use of it, user would have to
      
      - patch iproute2
      - add a class
      - attach a qdisc to the class (default pfifo doesn't work as
        q->handle is 0 and cbq_set_police() is a no-op in this case)
      - re-'add' the same class (tc class change ...) again
      - user must also specifiy a defmap (e.g. 'split 1:0 defmap 3f'), since
        this 'police' feature relies on its presence
      - the added qdisc must be one of bfifo, pfifo or netem
      
      If all of these conditions are met and _some_ leaf qdiscs, namely
      p/bfifo, netem, plug or tbf would drop a packet, kernel calls back into
      cbq, which will attempt to re-queue the skb into a different class
      as indicated by the parents' defmap entry for TC_PRIO_BESTEFFORT.
      
      [ i.e. we behave as if tc_classify returned TC_ACT_RECLASSIFY ].
      
      This feature, which isn't documented or implemented in iproute2,
      and isn't implemented consistently (most qdiscs like sfq, codel, etc
      drop right away instead of attempting this reclassification) is the
      sole reason for the reshape_fail and __parent member in Qdisc struct.
      
      So remove TCA_CBQ_POLICE support from the kernel, reject it via EOPNOTSUPP
      so userspace knows we don't support it, and then remove no-longer needed
      infrastructure in followup commit.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd47c1fa
    • D
      net: Add l3mdev rule · 96c63fa7
      David Ahern 提交于
      Currently, VRFs require 1 oif and 1 iif rule per address family per
      VRF. As the number of VRF devices increases it brings scalability
      issues with the increasing rule list. All of the VRF rules have the
      same format with the exception of the specific table id to direct the
      lookup. Since the table id is available from the oif or iif in the
      loopup, the VRF rules can be consolidated to a single rule that pulls
      the table from the VRF device.
      
      This patch introduces a new rule attribute l3mdev. The l3mdev rule
      means the table id used for the lookup is pulled from the L3 master
      device (e.g., VRF) rather than being statically defined. With the
      l3mdev rule all of the basic VRF FIB rules are reduced to 1 l3mdev
      rule per address family (IPv4 and IPv6).
      
      If an admin wishes to insert higher priority rules for specific VRFs
      those rules will co-exist with the l3mdev rule. This capability means
      current VRF scripts will co-exist with this new simpler implementation.
      
      Currently, the rules list for both ipv4 and ipv6 look like this:
          $ ip  ru ls
          1000:       from all oif vrf1 lookup 1001
          1000:       from all iif vrf1 lookup 1001
          1000:       from all oif vrf2 lookup 1002
          1000:       from all iif vrf2 lookup 1002
          1000:       from all oif vrf3 lookup 1003
          1000:       from all iif vrf3 lookup 1003
          1000:       from all oif vrf4 lookup 1004
          1000:       from all iif vrf4 lookup 1004
          1000:       from all oif vrf5 lookup 1005
          1000:       from all iif vrf5 lookup 1005
          1000:       from all oif vrf6 lookup 1006
          1000:       from all iif vrf6 lookup 1006
          1000:       from all oif vrf7 lookup 1007
          1000:       from all iif vrf7 lookup 1007
          1000:       from all oif vrf8 lookup 1008
          1000:       from all iif vrf8 lookup 1008
          ...
          32765:      from all lookup local
          32766:      from all lookup main
          32767:      from all lookup default
      
      With the l3mdev rule the list is just the following regardless of the
      number of VRFs:
          $ ip ru ls
          1000:       from all lookup [l3mdev table]
          32765:      from all lookup local
          32766:      from all lookup main
          32767:      from all lookup default
      
      (Note: the above pretty print of the rule is based on an iproute2
             prototype. Actual verbage may change)
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96c63fa7
    • F
      net: dsa: Initialize CPU port ethtool ops per tree · 0c73c523
      Florian Fainelli 提交于
      Now that we can properly support multiple distinct trees in the system,
      using a global variable: dsa_cpu_port_ethtool_ops is getting clobbered
      as soon as the second switch tree gets probed, and we don't want that.
      
      We need to move this to be dynamically allocated, and since we can't
      really be comparing addresses anymore to determine first time
      initialization versus any other times, just move this to dsa.c and
      dsa2.c where the remainder of the dst/ds initialization happens.
      
      The operations teardown restores the master netdev's ethtool_ops to its
      original ethtool_ops pointer (typically within the Ethernet driver)
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c73c523
  10. 08 6月, 2016 1 次提交