1. 04 7月, 2013 1 次提交
  2. 02 7月, 2013 1 次提交
    • E
      netem: use rb tree to implement the time queue · aec0a40a
      Eric Dumazet 提交于
      Following typical setup to implement a ~100 ms RTT and big
      amount of reorders has very poor performance because netem
      implements the time queue using a linked list.
      -----------------------------------------------------------
      ETH=eth0
      IFB=ifb0
      modprobe ifb
      ip link set dev $IFB up
      tc qdisc add dev $ETH ingress 2>/dev/null
      tc filter add dev $ETH parent ffff: \
         protocol ip u32 match u32 0 0 flowid 1:1 action mirred egress \
         redirect dev $IFB
      ethtool -K $ETH gro off tso off gso off
      tc qdisc add dev $IFB root netem delay 50ms 10ms limit 100000
      tc qd add dev $ETH root netem delay 50ms limit 100000
      ---------------------------------------------------------
      
      Switch netem time queue to a rb tree, so this kind of setup can work at
      high speed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aec0a40a
  3. 30 1月, 2013 1 次提交
    • J
      netem: fix delay calculation in rate extension · a13d3104
      Johannes Naab 提交于
      The delay calculation with the rate extension introduces in v3.3 does
      not properly work, if other packets are still queued for transmission.
      For the delay calculation to work, both delay types (latency and delay
      introduces by rate limitation) have to be handled differently. The
      latency delay for a packet can overlap with the delay of other packets.
      The delay introduced by the rate however is separate, and can only
      start, once all other rate-introduced delays finished.
      
      Latency delay is from same distribution for each packet, rate delay
      depends on the packet size.
      
      .: latency delay
      -: rate delay
      x: additional delay we have to wait since another packet is currently
         transmitted
      
        .....----                    Packet 1
          .....xx------              Packet 2
                     .....------     Packet 3
          ^^^^^
          latency stacks
               ^^
               rate delay doesn't stack
                     ^^
                     latency stacks
      
        -----> time
      
      When a packet is enqueued, we first consider the latency delay. If other
      packets are already queued, we can reduce the latency delay until the
      last packet in the queue is send, however the latency delay cannot be
      <0, since this would mean that the rate is overcommitted.  The new
      reference point is the time at which the last packet will be send. To
      find the time, when the packet should be send, the rate introduces delay
      has to be added on top of that.
      Signed-off-by: NJohannes Naab <jn@stusta.de>
      Acked-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a13d3104
  4. 17 7月, 2012 1 次提交
    • E
      netem: refine early skb orphaning · 5a308f40
      Eric Dumazet 提交于
      netem does an early orphaning of skbs. Doing so breaks TCP Small Queue
      or any mechanism relying on socket sk_wmem_alloc feedback.
      
      Ideally, we should perform this orphaning after the rate module and
      before the delay module, to mimic what happens on a real link :
      
      skb orphaning is indeed normally done at TX completion, before the
      transit on the link.
      
      +-------+   +--------+  +---------------+  +-----------------+
      + Qdisc +---> Device +--> TX completion +--> links / hops    +->
      +       +   +  xmit  +  + skb orphaning +  + propagation     +
      +-------+   +--------+  +---------------+  +-----------------+
            < rate limiting >                  < delay, drops, reorders >
      
      If netem is used without delay feature (drops, reorders, rate
      limiting), then we should avoid early skb orphaning, to keep pressure
      on sockets as long as packets are still in qdisc queue.
      
      Ideally, netem should be refactored to implement delay module
      as the last stage. Current algorithm merges the two phases
      (rate limiting + delay) so its not correct.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hagen Paul Pfeifer <hagen@jauu.net>
      Cc: Mark Gordon <msg@google.com>
      Cc: Andreas Terzis <aterzis@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a308f40
  5. 09 7月, 2012 1 次提交
    • E
      netem: add limitation to reordered packets · 960fb66e
      Eric Dumazet 提交于
      Fix two netem bugs :
      
      1) When a frame was dropped by tfifo_enqueue(), drop counter
         was incremented twice.
      
      2) When reordering is triggered, we enqueue a packet without
         checking queue limit. This can OOM pretty fast when this
         is repeated enough, since skbs are orphaned, no socket limit
         can help in this situation.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Mark Gordon <msg@google.com>
      Cc: Andreas Terzis <aterzis@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Hagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      960fb66e
  6. 02 5月, 2012 1 次提交
  7. 01 5月, 2012 1 次提交
  8. 02 4月, 2012 1 次提交
  9. 20 2月, 2012 1 次提交
  10. 10 2月, 2012 1 次提交
  11. 07 2月, 2012 1 次提交
  12. 23 1月, 2012 1 次提交
  13. 06 1月, 2012 1 次提交
  14. 31 12月, 2011 1 次提交
    • E
      netem: fix classful handling · 50612537
      Eric Dumazet 提交于
      Commit 10f6dfcf (Revert "sch_netem: Remove classful functionality")
      reintroduced classful functionality to netem, but broke basic netem
      behavior :
      
      netem uses an t(ime)fifo queue, and store timestamps in skb->cb[]
      
      If qdisc is changed, time constraints are not respected and other qdisc
      can destroy skb->cb[] and block netem at dequeue time.
      
      Fix this by always using internal tfifo, and optionally attach a child
      qdisc to netem (or a tree of qdiscs)
      
      Example of use :
      
      DEV=eth3
      tc qdisc del dev $DEV root
      tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms
      tc qdisc add dev $DEV handle 40:0 parent 30:0 tbf \
      	burst 20480 limit 20480 mtu 1514 rate 32000bps
      
      qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms  10.0ms
       Sent 190792 bytes 413 pkt (dropped 0, overlimits 0 requeues 0)
       rate 18416bit 3pps backlog 0b 0p requeues 0
      qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us
       Sent 190792 bytes 413 pkt (dropped 6, overlimits 10 requeues 0)
       backlog 0b 5p requeues 0
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50612537
  15. 25 12月, 2011 1 次提交
  16. 24 12月, 2011 1 次提交
  17. 13 12月, 2011 1 次提交
    • H
      netem: add cell concept to simulate special MAC behavior · 90b41a1c
      Hagen Paul Pfeifer 提交于
      This extension can be used to simulate special link layer
      characteristics. Simulate because packet data is not modified, only the
      calculation base is changed to delay a packet based on the original
      packet size and artificial cell information.
      
      packet_overhead can be used to simulate a link layer header compression
      scheme (e.g. set packet_overhead to -20) or with a positive
      packet_overhead value an additional MAC header can be simulated. It is
      also possible to "replace" the 14 byte Ethernet header with something
      else.
      
      cell_size and cell_overhead can be used to simulate link layer schemes,
      based on cells, like some TDMA schemes. Another application area are MAC
      schemes using a link layer fragmentation with a (small) header each.
      Cell size is the maximum amount of data bytes within one cell. Cell
      overhead is an additional variable to change the per-cell-overhead
      (e.g.  5 byte header per fragment).
      
      Example (5 kbit/s, 20 byte per packet overhead, cell-size 100 byte, per
      cell overhead 5 byte):
      
        tc qdisc add dev eth0 root netem rate 5kbit 20 100 5
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90b41a1c
  18. 02 12月, 2011 1 次提交
  19. 01 12月, 2011 1 次提交
    • H
      netem: rate extension · 7bc0f28c
      Hagen Paul Pfeifer 提交于
      Currently netem is not in the ability to emulate channel bandwidth. Only static
      delay (and optional random jitter) can be configured.
      
      To emulate the channel rate the token bucket filter (sch_tbf) can be used.  But
      TBF has some major emulation flaws. The buffer (token bucket depth/rate) cannot
      be 0. Also the idea behind TBF is that the credit (token in buckets) fills if
      no packet is transmitted. So that there is always a "positive" credit for new
      packets. In real life this behavior contradicts the law of nature where
      nothing can travel faster as speed of light. E.g.: on an emulated 1000 byte/s
      link a small IPv4/TCP SYN packet with ~50 byte require ~0.05 seconds - not 0
      seconds.
      
      Netem is an excellent place to implement a rate limiting feature: static
      delay is already implemented, tfifo already has time information and the
      user can skip TBF configuration completely.
      
      This patch implement rate feature which can be configured via tc. e.g:
      
      	tc qdisc add dev eth0 root netem rate 10kbit
      
      To emulate a link of 5000byte/s and add an additional static delay of 10ms:
      
      	tc qdisc add dev eth0 root netem delay 10ms rate 5KBps
      
      Note: similar to TBF the rate extension is bounded to the kernel timing
      system. Depending on the architecture timer granularity, higher rates (e.g.
      10mbit/s and higher) tend to transmission bursts. Also note: further queues
      living in network adaptors; see ethtool(8).
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@drr.davemloft.net>
      7bc0f28c
  20. 22 6月, 2011 1 次提交
    • A
      net: remove mm.h inclusion from netdevice.h · b7f080cf
      Alexey Dobriyan 提交于
      Remove linux/mm.h inclusion from netdevice.h -- it's unused (I've checked manually).
      
      To prevent mm.h inclusion via other channels also extract "enum dma_data_direction"
      definition into separate header. This tiny piece is what gluing netdevice.h with mm.h
      via "netdevice.h => dmaengine.h => dma-mapping.h => scatterlist.h => mm.h".
      Removal of mm.h from scatterlist.h was tried and was found not feasible
      on most archs, so the link was cutoff earlier.
      
      Hope people are OK with tiny include file.
      
      Note, that mm_types.h is still dragged in, but it is a separate story.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7f080cf
  21. 31 3月, 2011 1 次提交
  22. 25 2月, 2011 7 次提交
  23. 21 1月, 2011 2 次提交
    • E
      net_sched: accurate bytes/packets stats/rates · 9190b3b3
      Eric Dumazet 提交于
      In commit 44b82883 (net_sched: pfifo_head_drop problem), we fixed
      a problem with pfifo_head drops that incorrectly decreased
      sch->bstats.bytes and sch->bstats.packets
      
      Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
      previously enqueued packet, and bstats cannot be changed, so
      bstats/rates are not accurate (over estimated)
      
      This patch changes the qdisc_bstats updates to be done at dequeue() time
      instead of enqueue() time. bstats counters no longer account for dropped
      frames, and rates are more correct, since enqueue() bursts dont have
      effect on dequeue() rate.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9190b3b3
    • E
      net_sched: move TCQ_F_THROTTLED flag · fd245a4a
      Eric Dumazet 提交于
      In commit 37112105 (net: QDISC_STATE_RUNNING dont need atomic bit
      ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
      the cache line containing qdisc lock and often dirtied fields.
      
      I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
      mostly, and shared by all cpus. This should speedup HTB/CBQ for example.
      
      Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
      "unsigned int" for __state container, reducing by 8 bytes Qdisc size.
      
      Introduce helpers to hide implementation details.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jesper Dangaard Brouer <hawk@diku.dk>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd245a4a
  24. 20 1月, 2011 1 次提交
  25. 11 1月, 2011 1 次提交
  26. 21 10月, 2010 1 次提交
  27. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  28. 30 11月, 2009 1 次提交
  29. 20 4月, 2009 1 次提交
    • J
      net: sch_netem: Fix an inconsistency in ingress netem timestamps. · 8caf1539
      Jarek Poplawski 提交于
      Alex Sidorenko reported:
      
      "while experimenting with 'netem' we have found some strange behaviour. It
      seemed that ingress delay as measured by 'ping' command shows up on some
      hosts but not on others.
      
      After some investigation I have found that the problem is that skbuff->tstamp
      field value depends on whether there are any packet sniffers enabled. That
      is:
      
      - if any ptype_all handler is registered, the tstamp field is as expected
      - if there are no ptype_all handlers, the tstamp field does not show the delay"
      
      This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit()
      on ingress path (with act_mirred) adding a check, so minimal overhead on
      the fast path, but only when sniffers etc. are active.
      
      Since netem at ingress seems to logically emulate a network before a host,
      tstamp is zeroed to trigger the update and pretend delays are from the
      outside.
      Reported-by: NAlex Sidorenko <alexandre.sidorenko@hp.com>
      Tested-by: NAlex Sidorenko <alexandre.sidorenko@hp.com>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8caf1539
  30. 23 12月, 2008 1 次提交
  31. 15 12月, 2008 1 次提交
  32. 20 11月, 2008 1 次提交
  33. 14 11月, 2008 1 次提交