1. 18 12月, 2013 2 次提交
  2. 14 12月, 2013 1 次提交
  3. 12 12月, 2013 2 次提交
  4. 11 12月, 2013 5 次提交
  5. 10 12月, 2013 1 次提交
  6. 07 12月, 2013 1 次提交
  7. 06 12月, 2013 5 次提交
  8. 01 12月, 2013 3 次提交
  9. 24 11月, 2013 1 次提交
    • E
      sch_tbf: handle too small burst · 4d0820cf
      Eric Dumazet 提交于
      If a too small burst is inadvertently set on TBF, we might trigger
      a bug in tbf_segment(), as 'skb' instead of 'segs' was used in a
      qdisc_reshape_fail() call.
      
      tc qdisc add dev eth0 root handle 1: tbf latency 50ms burst 1KB rate
      50mbit
      
      Fix the bug, and add a warning, as such configuration is not
      going to work anyway for non GSO packets.
      
      (For some reason, one has to use a burst >= 1520 to get a working
      configuration, even with old kernels. This is a probable iproute2/tc
      bug)
      
      Based on a report and initial patch from Yang Yingliang
      
      Fixes: e43ac79a ("sch_tbf: segment too big GSO packets")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d0820cf
  10. 16 11月, 2013 2 次提交
    • E
      pkt_sched: fq: fix pacing for small frames · f52ed899
      Eric Dumazet 提交于
      For performance reasons, sch_fq tried hard to not setup timers for every
      sent packet, using a quantum based heuristic : A delay is setup only if
      the flow exhausted its credit.
      
      Problem is that application limited flows can refill their credit
      for every queued packet, and they can evade pacing.
      
      This problem can also be triggered when TCP flows use small MSS values,
      as TSO auto sizing builds packets that are smaller than the default fq
      quantum (3028 bytes)
      
      This patch adds a 40 ms delay to guard flow credit refill.
      
      Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f52ed899
    • E
      pkt_sched: fq: warn users using defrate · 65c5189a
      Eric Dumazet 提交于
      Commit 7eec4174 ("pkt_sched: fq: fix non TCP flows pacing")
      obsoleted TCA_FQ_FLOW_DEFAULT_RATE without notice for the users.
      
      Suggested by David Miller
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65c5189a
  11. 15 11月, 2013 1 次提交
  12. 10 11月, 2013 1 次提交
  13. 08 11月, 2013 1 次提交
  14. 30 10月, 2013 1 次提交
    • D
      net: sched: cls_bpf: add BPF-based classifier · 7d1d65cb
      Daniel Borkmann 提交于
      This work contains a lightweight BPF-based traffic classifier that can
      serve as a flexible alternative to ematch-based tree classification, i.e.
      now that BPF filter engine can also be JITed in the kernel. Naturally, tc
      actions and policies are supported as well with cls_bpf. Multiple BPF
      programs/filter can be attached for a class, or they can just as well be
      written within a single BPF program, that's really up to the user how he
      wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
      The notion of a BPF program's return/exit codes is being kept as follows:
      
           0: No match
          -1: Select classid given in "tc filter ..." command
        else: flowid, overwrite the default one
      
      As a minimal usage example with iproute2, we use a 3 band prio root qdisc
      on a router with sfq each as leave, and assign ssh and icmp bpf-based
      filters to band 1, http traffic to band 2 and the rest to band 3. For the
      first two bands we load the bytecode from a file, in the 2nd we load it
      inline as an example:
      
      echo 1 > /proc/sys/net/core/bpf_jit_enable
      
      tc qdisc del dev em1 root
      tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
      
      tc qdisc add dev em1 parent 1:1 sfq perturb 16
      tc qdisc add dev em1 parent 1:2 sfq perturb 16
      tc qdisc add dev em1 parent 1:3 sfq perturb 16
      
      tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
      tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
      tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
      tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3
      
      BPF programs can be easily created and passed to tc, either as inline
      'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
      compile opcodes, for example:
      
      1) People familiar with tcpdump-like filters:
      
         tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf
      
      2) People that want to low-level program their filters or use BPF
         extensions that lack support by libpcap's compiler:
      
         bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf
      
         ssh.ops example code:
         ldh [12]
         jne #0x800, drop
         ldb [23]
         jneq #6, drop
         ldh [20]
         jset #0x1fff, drop
         ldxb 4 * ([14] & 0xf)
         ldh [%x + 14]
         jeq #0x16, pass
         ldh [%x + 16]
         jne #0x16, drop
         pass: ret #-1
         drop: ret #0
      
      It was chosen to load bytecode into tc, since the reverse operation,
      tc filter list dev em1, is then able to show the exact commands again.
      Possible follow-up work could also include a small expression compiler
      for iproute2. Tested with the help of bmon. This idea came up during
      the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
      Eric Dumazet!
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d1d65cb
  15. 28 10月, 2013 1 次提交
  16. 26 10月, 2013 1 次提交
    • H
      netem: markov loss model transition fix · 4a3ad7b3
      Hagen Paul Pfeifer 提交于
      The transition from markov state "3 => lost packets within a burst
      period" to "1 => successfully transmitted packets within a gap period"
      has no *additional* loss event. The loss already happen for transition
      from 1 -> 3, this additional loss will make things go wild.
      
      E.g. transition probabilities:
      
      p13:   10%
      p31:  100%
      
      Expected:
      
      Ploss = p13 / (p13 + p31)
      Ploss = ~9.09%
      
      ... but it isn't. Even worse: we get a double loss - each time.
      So simple don't return true to indicate loss, rather break and return
      false.
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Stefano Salsano <stefano.salsano@uniroma2.it>
      Cc: Fabio Ludovici <fabio.ludovici@yahoo.it>
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a3ad7b3
  17. 19 10月, 2013 1 次提交
  18. 12 10月, 2013 2 次提交
  19. 09 10月, 2013 4 次提交
  20. 08 10月, 2013 1 次提交
    • E
      net: Separate the close_list and the unreg_list v2 · 5cde2829
      Eric W. Biederman 提交于
      Separate the unreg_list and the close_list in dev_close_many preventing
      dev_close_many from permuting the unreg_list.  The permutations of the
      unreg_list have resulted in cases where the loopback device is accessed
      it has been freed in code such as dst_ifdown.  Resulting in subtle memory
      corruption.
      
      This is the second bug from sharing the storage between the close_list
      and the unreg_list.  The issues that crop up with sharing are
      apparently too subtle to show up in normal testing or usage, so let's
      forget about being clever and use two separate lists.
      
      v2: Make all callers pass in a close_list to dev_close_many
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5cde2829
  21. 02 10月, 2013 1 次提交
    • E
      pkt_sched: fq: rate limiting improvements · 0eab5eb7
      Eric Dumazet 提交于
      FQ rate limiting suffers from two problems, reported
      by Steinar :
      
      1) FQ enforces a delay when flow quantum is exhausted in order
      to reduce cpu overhead. But if packets are small, current
      delay computation is slightly wrong, and observed rates can
      be too high.
      
      Steinar had this problem because he disabled TSO and GSO,
      and default FQ quantum is 2*1514.
      
      (Of course, I wish recent TSO auto sizing changes will help
      to not having to disable TSO in the first place)
      
      2) maxrate was not used for forwarded flows (skbs not attached
      to a socket)
      
      Tested:
      
      tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit
      netperf -H lpq84 -l 1000 &
      sleep 10 ; tc -s qdisc show dev eth0
      qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024
       quantum 3028 initial_quantum 15140 maxrate 8000Kbit
       Sent 16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0)
       rate 7831Kbit 653pps backlog 7570b 5p requeues 0
        44 flows (43 inactive, 1 throttled), next packet delay 2977352 ns
        0 gc, 0 highprio, 5545 throttled
      
      lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12
      09:02:52.079484 IP lpq83 > lpq84: . 1389536928:1389538376(1448) ack 3808678021 win 457 <nop,nop,timestamp 961812 572609068>
      09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457 <nop,nop,timestamp 961812 572609068>
      09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384 <nop,nop,timestamp 572609080 961812>
      09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
      09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
      09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384 <nop,nop,timestamp 572609083 961815>
      09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
      09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
      09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384 <nop,nop,timestamp 572609086 961818>
      09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
      09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
      09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384 <nop,nop,timestamp 572609090 961821>
      Reported-by: NSteinar H. Gunderson <sesse@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eab5eb7
  22. 01 10月, 2013 2 次提交