1. 05 3月, 2011 1 次提交
  2. 04 3月, 2011 1 次提交
  3. 26 2月, 2011 1 次提交
  4. 25 2月, 2011 8 次提交
  5. 24 2月, 2011 3 次提交
    • S
      em_meta: fix sparse warning · e0c56310
      stephen hemminger 提交于
      gfp_t needs to be cast to integer.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0c56310
    • S
      mqprio: cleanups · ea18fd95
      stephen hemminger 提交于
      * make qdisc_ops local
      * add sparse annotation about expected unlock/unlock in dump_class_stats
      * fix indentation
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea18fd95
    • E
      net_sched: SFB flow scheduler · e13e02a3
      Eric Dumazet 提交于
      This is the Stochastic Fair Blue scheduler, based on work from :
      
      W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
      Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.
      
      http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
      
      This implementation is based on work done by Juliusz Chroboczek
      
      General SFB algorithm can be found in figure 14, page 15:
      
      B[l][n] : L x N array of bins (L levels, N bins per level)
      enqueue()
      Calculate hash function values h{0}, h{1}, .. h{L-1}
      Update bins at each level
      for i = 0 to L - 1
         if (B[i][h{i}].qlen > bin_size)
            B[i][h{i}].p_mark += p_increment;
         else if (B[i][h{i}].qlen == 0)
            B[i][h{i}].p_mark -= p_decrement;
      p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
      if (p_min == 1.0)
          ratelimit();
      else
          mark/drop with probabilty p_min;
      
      I did the adaptation of Juliusz code to meet current kernel standards,
      and various changes to address previous comments :
      
      http://thread.gmane.org/gmane.linux.network/90225
      http://thread.gmane.org/gmane.linux.network/90375
      
      Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
      we can use an external flow classifier if wanted.
      
      tc qdisc add dev $DEV parent 1:11 handle 11:  \
              est 0.5sec 2sec sfb limit 128
      
      tc filter add dev $DEV protocol ip parent 11: handle 3 \
              flow hash keys dst divisor 1024
      
      Notes:
      
      1) SFB default child qdisc is pfifo_fast. It can be changed by another
      qdisc but a child qdisc MUST not drop a packet previously queued. This
      is because SFB needs to handle a dequeued packet in order to maintain
      its virtual queue states. pfifo_head_drop or CHOKe should not be used.
      
      2) ECN is enabled by default, unlike RED/CHOKe/GRED
      
      With help from Patrick McHardy & Andi Kleen
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Andi Kleen <andi@firstfloor.org>
      CC: John W. Linville <linville@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e13e02a3
  6. 23 2月, 2011 1 次提交
  7. 21 2月, 2011 1 次提交
    • E
      net: Fix more stale on-stack list_head objects. · 5f04d506
      Eric W. Biederman 提交于
      From: Eric W. Biederman <ebiederm@xmission.com>
      
      In the beginning with batching unreg_list was a list that was used only
      once in the lifetime of a network device (I think).  Now we have calls
      using the unreg_list that can happen multiple times in the life of a
      network device like dev_deactivate and dev_close that are also using the
      unreg_list.  In addition in unregister_netdevice_queue we also do a
      list_move because for devices like veth pairs it is possible that
      unregister_netdevice_queue will be called multiple times.
      
      So I think the change below to fix dev_deactivate which Eric D. missed
      will fix this problem.  Now to go test that.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f04d506
  8. 15 2月, 2011 1 次提交
  9. 03 2月, 2011 3 次提交
  10. 27 1月, 2011 1 次提交
  11. 22 1月, 2011 1 次提交
    • E
      net_sched: TCQ_F_CAN_BYPASS generalization · 23624935
      Eric Dumazet 提交于
      Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in
      __dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs
      than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq
      
      SFQ is special because it can have external classifiers, and in these
      cases, we cannot bypass queue discipline (packet could be dropped by
      classifier) without admin asking it, or further changes.
      
      Its worth doing this, especially for SFQ, avoiding dirtying memory in
      case no packets are already waiting in queue.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23624935
  12. 21 1月, 2011 4 次提交
    • E
      net_sched: accurate bytes/packets stats/rates · 9190b3b3
      Eric Dumazet 提交于
      In commit 44b82883 (net_sched: pfifo_head_drop problem), we fixed
      a problem with pfifo_head drops that incorrectly decreased
      sch->bstats.bytes and sch->bstats.packets
      
      Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
      previously enqueued packet, and bstats cannot be changed, so
      bstats/rates are not accurate (over estimated)
      
      This patch changes the qdisc_bstats updates to be done at dequeue() time
      instead of enqueue() time. bstats counters no longer account for dropped
      frames, and rates are more correct, since enqueue() bursts dont have
      effect on dequeue() rate.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9190b3b3
    • E
      net_sched: RCU conversion of stab · a2da570d
      Eric Dumazet 提交于
      This patch converts stab qdisc management to RCU, so that we can perform
      the qdisc_calculate_pkt_len() call before getting qdisc lock.
      
      This shortens the lock's held time in __dev_xmit_skb().
      
      This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of
      cache misses and so reducing latencies.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jesper Dangaard Brouer <hawk@diku.dk>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2da570d
    • E
      net_sched: move TCQ_F_THROTTLED flag · fd245a4a
      Eric Dumazet 提交于
      In commit 37112105 (net: QDISC_STATE_RUNNING dont need atomic bit
      ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
      the cache line containing qdisc lock and often dirtied fields.
      
      I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
      mostly, and shared by all cpus. This should speedup HTB/CBQ for example.
      
      Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
      "unsigned int" for __state container, reducing by 8 bytes Qdisc size.
      
      Introduce helpers to hide implementation details.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jesper Dangaard Brouer <hawk@diku.dk>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd245a4a
    • E
      net_sched: sfq: allow divisor to be a parameter · 817fb15d
      Eric Dumazet 提交于
      SFQ currently uses a 1024 slots hash table, and its internal structure
      (sfq_sched_data) allocation needs order-1 page on x86_64
      
      Allow tc command to specify a divisor value (hash table size), between 1
      and 65536.
      If no value is provided, assume the 1024 default size.
      
      This allows admins to setup smaller (or bigger) SFQ for specific needs.
      
      This also brings back sfq_sched_data allocations to order-0 ones, saving
      3KB per SFQ qdisc.
      
      Jesper uses ~55.000 SFQ in one machine, this patch should free 165 MB of
      memory.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jesper Dangaard Brouer <hawk@diku.dk>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      817fb15d
  13. 20 1月, 2011 2 次提交
    • E
      net_sched: cleanups · cc7ec456
      Eric Dumazet 提交于
      Cleanup net/sched code to current CodingStyle and practices.
      
      Reduce inline abuse
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc7ec456
    • J
      net_sched: implement a root container qdisc sch_mqprio · b8970f0b
      John Fastabend 提交于
      This implements a mqprio queueing discipline that by default creates
      a pfifo_fast qdisc per tx queue and provides the needed configuration
      interface.
      
      Using the mqprio qdisc the number of tcs currently in use along
      with the range of queues alloted to each class can be configured. By
      default skbs are mapped to traffic classes using the skb priority.
      This mapping is configurable.
      
      Configurable parameters,
      
      struct tc_mqprio_qopt {
      	__u8    num_tc;
      	__u8    prio_tc_map[TC_BITMASK + 1];
      	__u8    hw;
      	__u16   count[TC_MAX_QUEUE];
      	__u16   offset[TC_MAX_QUEUE];
      };
      
      Here the count/offset pairing give the queue alignment and the
      prio_tc_map gives the mapping from skb->priority to tc.
      
      The hw bit determines if the hardware should configure the count
      and offset values. If the hardware bit is set then the operation
      will fail if the hardware does not implement the ndo_setup_tc
      operation. This is to avoid undetermined states where the hardware
      may or may not control the queue mapping. Also minimal bounds
      checking is done on the count/offset to verify a queue does not
      exceed num_tx_queues and that queue ranges do not overlap. Otherwise
      it is left to user policy or hardware configuration to create
      useful mappings.
      
      It is expected that hardware QOS schemes can be implemented by
      creating appropriate mappings of queues in ndo_tc_setup().
      
      One expected use case is drivers will use the ndo_setup_tc to map
      queue ranges onto 802.1Q traffic classes. This provides a generic
      mechanism to map network traffic onto these traffic classes and
      removes the need for lower layer drivers to know specifics about
      traffic types.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8970f0b
  14. 14 1月, 2011 2 次提交
    • P
      netfilter: fix Kconfig dependencies · c7066f70
      Patrick McHardy 提交于
      Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE,
      which itself depends on NET_SCHED; this dependency is missing from netfilter.
      
      Since matching on realms is also useful without having NET_SCHED enabled and
      the option really only controls whether the tclassid member is included in
      route and dst entries, rename the config option to IP_ROUTE_CLASSID and move
      it outside of traffic scheduling context to get rid of the NET_SCHED dependeny.
      Reported-by: NVladis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      c7066f70
    • E
      net: remove dev_txq_stats_fold() · 1ac9ad13
      Eric Dumazet 提交于
      After recent changes, (percpu stats on vlan/tunnels...), we dont need
      anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.
      
      Only remaining users are ixgbe, sch_teql, gianfar & macvlan :
      
      1) ixgbe can be converted to use existing tx_ring counters.
      
      2) macvlan incremented txq->tx_dropped, it can use the
      dev->stats.tx_dropped counter.
      
      3) sch_teql : almost revert ab35cd4b (Use net_device internal stats)
          Now we have ndo_get_stats64(), use it, even for "unsigned long"
      fields (No need to bring back a struct net_device_stats)
      
      4) gianfar adds a stats structure per tx queue to hold
      tx_bytes/tx_packets
      
      This removes a lockdep warning (and possible lockup) in rndis gadget,
      calling dev_get_stats() from hard IRQ context.
      
      Ref: http://www.spinics.net/lists/netdev/msg149202.htmlReported-by: NNeil Jones <neiljay@gmail.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Sandeep Gopalpet <sandeep.kumar@freescale.com>
      CC: Michal Nazarewicz <mina86@mina86.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ac9ad13
  15. 11 1月, 2011 1 次提交
  16. 06 1月, 2011 1 次提交
    • E
      net_sched: pfifo_head_drop problem · 44b82883
      Eric Dumazet 提交于
      commit 57dbb2d8 (sched: add head drop fifo queue)
      introduced pfifo_head_drop, and broke the invariant that
      sch->bstats.bytes and sch->bstats.packets are COUNTER (increasing
      counters only)
      
      This can break estimators because est_timer() handles unsigned deltas
      only. A decreasing counter can then give a huge unsigned delta.
      
      My mid term suggestion would be to change things so that
      sch->bstats.bytes and sch->bstats.packets are incremented in dequeue()
      only, not at enqueue() time. We also could add drop_bytes/drop_packets
      and provide estimations of drop rates.
      
      It would be more sensible anyway for very low speeds, and big bursts.
      Right now, if we drop packets, they still are accounted in byte/packets
      abolute counters and rate estimators.
      
      Before this mid term change, this patch makes pfifo_head_drop behavior
      similar to other qdiscs in case of drops :
      Dont decrement sch->bstats.bytes and sch->bstats.packets
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44b82883
  17. 04 1月, 2011 1 次提交
  18. 01 1月, 2011 2 次提交
  19. 23 12月, 2010 1 次提交
    • E
      sfq: fix sfq class stats handling · ee09b3c1
      Eric Dumazet 提交于
      sfq_walk() runs without qdisc lock. By the time it selects a non empty
      hash slot and sfq_dump_class_stats() is run (with lock held), slot might
      have been freed : We then access q->slots[SFQ_EMPTY_SLOT], out of
      bounds, and crash in slot_queue_walk()
      
      On previous kernels, bug is here but out of bounds qs[SFQ_DEPTH] and
      allot[SFQ_DEPTH] are located in struct sfq_sched_data, so no illegal
      memory access happens, only possibly wrong data reported to user.
      
      Also, slot_dequeue_tail() should make sure slot skb chain is correctly
      terminated, or sfq_dump_class_stats() can access freed skbs.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee09b3c1
  20. 21 12月, 2010 3 次提交
    • E
      net_sched: sch_sfq: better struct layouts · eda83e3b
      Eric Dumazet 提交于
      Here is a respin of patch.
      
      I'll send a short patch to make SFQ more fair in presence of large
      packets as well.
      
      Thanks
      
      [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts
      
      This patch shrinks sizeof(struct sfq_sched_data)
      from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and
      reduce text size as well.
      
         text    data     bss     dec     hex filename
         4821     152       0    4973    136d old/net/sched/sch_sfq.o
         4627     136       0    4763    129b new/net/sched/sch_sfq.o
      
      All data for a slot/flow is now grouped in a compact and cache friendly
      structure, instead of being spreaded in many different points.
      
      struct sfq_slot {
              struct sk_buff  *skblist_next;
              struct sk_buff  *skblist_prev;
              sfq_index       qlen; /* number of skbs in skblist */
              sfq_index       next; /* next slot in sfq chain */
              struct sfq_head dep; /* anchor in dep[] chains */
              unsigned short  hash; /* hash value (index in ht[]) */
              short           allot; /* credit for this slot */
      };
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Jarek Poplawski <jarkao2@gmail.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eda83e3b
    • E
      net_sched: sch_sfq: fix allot handling · aa3e2199
      Eric Dumazet 提交于
      When deploying SFQ/IFB here at work, I found the allot management was
      pretty wrong in sfq, even changing allot from short to int...
      
      We should init allot for each new flow, not using a previous value found
      in slot.
      
      Before patch, I saw bursts of several packets per flow, apparently
      denying the default "quantum 1514" limit I had on my SFQ class.
      
      class sfq 11:1 parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 7p requeues 0 
       allot 11546 
      
      class sfq 11:46 parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 1p requeues 0 
       allot -23873 
      
      class sfq 11:78 parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 5p requeues 0 
       allot 11393 
      
      After patch, better fairness among each flow, allot limit being
      respected, allot is positive :
      
      class sfq 11:e parent 11: 
       (dropped 0, overlimits 0 requeues 86) 
       backlog 0b 3p requeues 86 
       allot 596 
      
      class sfq 11:94 parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 3p requeues 0 
       allot 1468 
      
      class sfq 11:a4 parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 4p requeues 0 
       allot 650 
      
      class sfq 11:bb parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 3p requeues 0 
       allot 596 
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa3e2199
    • E
      net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() · c4266263
      Eric Dumazet 提交于
      We currently return for each active SFQ slot the number of packets in
      queue. We can also give number of bytes accounted for these packets.
      
      tc -s class show dev ifb0
      
      Before patch :
      
      class sfq 11:3d9 parent 11:
       (dropped 0, overlimits 0 requeues 0)
       backlog 0b 3p requeues 0
       allot 1266
      
      After patch :
      
      class sfq 11:3e4 parent 11:
       (dropped 0, overlimits 0 requeues 0)
       backlog 4380b 3p requeues 0
       allot 1212
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4266263
  21. 17 12月, 2010 1 次提交