1. 10 12月, 2011 1 次提交
    • E
      sch_red: generalize accurate MAX_P support to RED/GRED/CHOKE · a73ed26b
      Eric Dumazet 提交于
      Now RED uses a Q0.32 number to store max_p (max probability), allow
      RED/GRED/CHOKE to use/report full resolution at config/dump time.
      
      Old tc binaries are non aware of new attributes, and still set/get Plog.
      
      New tc binary set/get both Plog and max_p for backward compatibility,
      they display "probability value" if they get max_p from new kernels.
      
      # tc -d  qdisc show dev ...
      ...
      qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5
      probability 0.09 Scell_log 15
      
      Make sure we avoid potential divides by 0 in reciprocal_value(), if
      (max_th - min_th) is big.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a73ed26b
  2. 09 12月, 2011 6 次提交
  3. 07 12月, 2011 4 次提交
  4. 06 12月, 2011 3 次提交
  5. 05 12月, 2011 2 次提交
    • P
      perf: Fix loss of notification with multi-event · 10c6db11
      Peter Zijlstra 提交于
      When you do:
              $ perf record -e cycles,cycles,cycles noploop 10
      
      You expect about 10,000 samples for each event, i.e., 10s at
      1000samples/sec. However, this is not what's happening. You
      get much fewer samples, maybe 3700 samples/event:
      
      $ perf report -D | tail -15
      Aggregated stats:
                 TOTAL events:      10998
                  MMAP events:         66
                  COMM events:          2
                SAMPLE events:      10930
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      cycles stats:
                 TOTAL events:       3642
                SAMPLE events:       3642
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      
      On a Intel Nehalem or even AMD64, there are 4 counters capable
      of measuring cycles, so there is plenty of space to measure those
      events without multiplexing (even with the NMI watchdog active).
      And even with multiplexing, we'd expect roughly the same number
      of samples per event.
      
      The root of the problem was that when the event that caused the buffer
      to become full was not the first event passed on the cmdline, the user
      notification would get lost. The notification was sent to the file
      descriptor of the overflowed event but the perf tool was not polling
      on it.  The perf tool aggregates all samples into a single buffer,
      i.e., the buffer of the first event. Consequently, it assumes
      notifications for any event will come via that descriptor.
      
      The seemingly straight forward solution of moving the waitq into the
      ringbuffer object doesn't work because of life-time issues. One could
      perf_event_set_output() on a fd that you're also blocking on and cause
      the old rb object to be freed while its waitq would still be
      referenced by the blocked thread -> FAIL.
      
      Therefore link all events to the ringbuffer and broadcast the wakeup
      from the ringbuffer object to all possible events that could be waited
      upon. This is rather ugly, and we're open to better solutions but it
      works for now.
      Reported-by: NStephane Eranian <eranian@google.com>
      Finished-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20111126014731.GA7030@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      10c6db11
    • E
      tcp: take care of misalignments · 117632e6
      Eric Dumazet 提交于
      We discovered that TCP stack could retransmit misaligned skbs if a
      malicious peer acknowledged sub MSS frame. This currently can happen
      only if output interface is non SG enabled : If SG is enabled, tcp
      builds headless skbs (all payload is included in fragments), so the tcp
      trimming process only removes parts of skb fragments, header stay
      aligned.
      
      Some arches cant handle misalignments, so force a head reallocation and
      shrink headroom to MAX_TCP_HEADER.
      
      Dont care about misaligments on x86 and PPC (or other arches setting
      NET_IP_ALIGN to 0)
      
      This patch introduces __pskb_copy() which can specify the headroom of
      new head, and pskb_copy() becomes a wrapper on top of __pskb_copy()
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      117632e6
  6. 04 12月, 2011 4 次提交
  7. 02 12月, 2011 2 次提交
  8. 01 12月, 2011 3 次提交
    • H
      netem: rate extension · 7bc0f28c
      Hagen Paul Pfeifer 提交于
      Currently netem is not in the ability to emulate channel bandwidth. Only static
      delay (and optional random jitter) can be configured.
      
      To emulate the channel rate the token bucket filter (sch_tbf) can be used.  But
      TBF has some major emulation flaws. The buffer (token bucket depth/rate) cannot
      be 0. Also the idea behind TBF is that the credit (token in buckets) fills if
      no packet is transmitted. So that there is always a "positive" credit for new
      packets. In real life this behavior contradicts the law of nature where
      nothing can travel faster as speed of light. E.g.: on an emulated 1000 byte/s
      link a small IPv4/TCP SYN packet with ~50 byte require ~0.05 seconds - not 0
      seconds.
      
      Netem is an excellent place to implement a rate limiting feature: static
      delay is already implemented, tfifo already has time information and the
      user can skip TBF configuration completely.
      
      This patch implement rate feature which can be configured via tc. e.g:
      
      	tc qdisc add dev eth0 root netem rate 10kbit
      
      To emulate a link of 5000byte/s and add an additional static delay of 10ms:
      
      	tc qdisc add dev eth0 root netem delay 10ms rate 5KBps
      
      Note: similar to TBF the rate extension is bounded to the kernel timing
      system. Depending on the architecture timer granularity, higher rates (e.g.
      10mbit/s and higher) tend to transmission bursts. Also note: further queues
      living in network adaptors; see ethtool(8).
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@drr.davemloft.net>
      7bc0f28c
    • D
      neigh: Add device constructor/destructor capability. · da6a8fa0
      David Miller 提交于
      If the neigh entry has device private state, it will need
      constructor/destructor ops.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da6a8fa0
    • D
      neigh: Add infrastructure for allocating device neigh privates. · 596b9b68
      David Miller 提交于
      netdev->neigh_priv_len records the private area length.
      
      This will trigger for neigh_table objects which set tbl->entry_size
      to zero, and the first instances of this will be forthcoming.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      596b9b68
  9. 30 11月, 2011 4 次提交
    • T
      bql: Byte queue limits · 114cf580
      Tom Herbert 提交于
      Networking stack support for byte queue limits, uses dynamic queue
      limits library.  Byte queue limits are maintained per transmit queue,
      and a dql structure has been added to netdev_queue structure for this
      purpose.
      
      Configuration of bql is in the tx-<n> sysfs directory for the queue
      under the byte_queue_limits directory.  Configuration includes:
      limit_min, bql minimum limit
      limit_max, bql maximum limit
      hold_time, bql slack hold time
      
      Also under the directory are:
      limit, current byte limit
      inflight, current number of bytes on the queue
      Signed-off-by: NTom Herbert <therbert@google.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      114cf580
    • T
      net: Add netdev interfaces for recording sends/comp · c5d67bd7
      Tom Herbert 提交于
      Add interfaces for drivers to call for recording number of packets and
      bytes at send time and transmit completion.  Also, added a function to
      "reset" a queue.  These will be used by Byte Queue Limits.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5d67bd7
    • T
      net: Add queue state xoff flag for stack · 73466498
      Tom Herbert 提交于
      Create separate queue state flags so that either the stack or drivers
      can turn on XOFF.  Added a set of functions used in the stack to determine
      if a queue is really stopped (either by stack or driver)
      Signed-off-by: NTom Herbert <therbert@google.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73466498
    • T
      dql: Dynamic queue limits · 75957ba3
      Tom Herbert 提交于
      Implementation of dynamic queue limits (dql).  This is a libary which
      allows a queue limit to be dynamically managed.  The goal of dql is
      to set the queue limit, number of objects to the queue, to be minimized
      without allowing the queue to be starved.
      
      dql would be used with a queue which has these properties:
      
      1) Objects are queued up to some limit which can be expressed as a
         count of objects.
      2) Periodically a completion process executes which retires consumed
         objects.
      3) Starvation occurs when limit has been reached, all queued data has
         actually been consumed but completion processing has not yet run,
         so queuing new data is blocked.
      4) Minimizing the amount of queued data is desirable.
      
      A canonical example of such a queue would be a NIC HW transmit queue.
      
      The queue limit is dynamic, it will increase or decrease over time
      depending on the workload.  The queue limit is recalculated each time
      completion processing is done.  Increases occur when the queue is
      starved and can exponentially increase over successive intervals.
      Decreases occur when more data is being maintained in the queue than
      needed to prevent starvation.  The number of extra objects, or "slack",
      is measured over successive intervals, and to avoid hysteresis the
      limit is only reduced by the miminum slack seen over a configurable
      time period.
      
      dql API provides routines to manage the queue:
      - dql_init is called to intialize the dql structure
      - dql_reset is called to reset dynamic values
      - dql_queued called when objects are being enqueued
      - dql_avail returns availability in the queue
      - dql_completed is called when objects have be consumed in the queue
      
      Configuration consists of:
      - max_limit, maximum limit
      - min_limit, minimum limit
      - slack_hold_time, time to measure instances of slack before reducing
        queue limit
      Signed-off-by: NTom Herbert <therbert@google.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75957ba3
  10. 29 11月, 2011 6 次提交
  11. 28 11月, 2011 3 次提交
  12. 27 11月, 2011 2 次提交