1. 08 1月, 2012 3 次提交
  2. 06 1月, 2012 4 次提交
    • L
      vfs: fix up ENOIOCTLCMD error handling · 07d106d0
      Linus Torvalds 提交于
      We're doing some odd things there, which already messes up various users
      (see the net/socket.c code that this removes), and it was going to add
      yet more crud to the block layer because of the incorrect error code
      translation.
      
      ENOIOCTLCMD is not an error return that should be returned to user mode
      from the "ioctl()" system call, but it should *not* be translated as
      EINVAL ("Invalid argument").  It should be translated as ENOTTY
      ("Inappropriate ioctl for device").
      
      That EINVAL confusion has apparently so permeated some code that the
      block layer actually checks for it, which is sad.  We continue to do so
      for now, but add a big comment about how wrong that is, and we should
      remove it entirely eventually.  In the meantime, this tries to keep the
      changes localized to just the EINVAL -> ENOTTY fix, and removing code
      that makes it harder to do the right thing.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      07d106d0
    • E
      net_sched: red: split red_parms into parms and vars · eeca6688
      Eric Dumazet 提交于
      This patch splits the red_parms structure into two components.
      
      One holding the RED 'constant' parameters, and one containing the
      variables.
      
      This permits a size reduction of GRED qdisc, and is a preliminary step
      to add an optional RED unit to SFQ.
      
      SFQRED will have a single red_parms structure shared by all flows, and a
      private red_vars per flow.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Dave Taht <dave.taht@gmail.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeca6688
    • E
      net_sched: sfq: extend limits · 18cb8098
      Eric Dumazet 提交于
      SFQ as implemented in Linux is very limited, with at most 127 flows
      and limit of 127 packets. [ So if 127 flows are active, we have one
      packet per flow ]
      
      This patch brings to SFQ following features to cope with modern needs.
      
      - Ability to specify a smaller per flow limit of inflight packets.
          (default value being at 127 packets)
      
      - Ability to have up to 65408 active flows (instead of 127)
      
      - Ability to have head drops instead of tail drops
        (to drop old packets from a flow)
      
      Example of use : No more than 20 packets per flow, max 8000 flows, max
      20000 packets in SFQ qdisc, hash table of 65536 slots.
      
      tc qdisc add ... sfq \
              flows 8000 \
              depth 20 \
              headdrop \
              limit 20000 \
      	divisor 65536
      
      Ram usage :
      
      2 bytes per hash table entry (instead of previous 1 byte/entry)
      32 bytes per flow on 64bit arches, instead of 384 for QFQ, so much
      better cache hit ratio.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Dave Taht <dave.taht@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18cb8098
    • H
      net_sched: Bug in netem reordering · eb101924
      Hagen Paul Pfeifer 提交于
      Not now, but it looks you are correct. q->qdisc is NULL until another
      additional qdisc is attached (beside tfifo). See 50612537.
      The following patch should work.
      
      From: Hagen Paul Pfeifer <hagen@jauu.net>
      
      netem: catch NULL pointer by updating the real qdisc statistic
      Reported-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb101924
  3. 05 1月, 2012 22 次提交
  4. 04 1月, 2012 4 次提交
  5. 01 1月, 2012 1 次提交
  6. 31 12月, 2011 6 次提交
    • X
      netfilter: ctnetlink: fix timeout calculation · c1216382
      Xi Wang 提交于
      The sanity check (timeout < 0) never works; the dividend is unsigned
      and so is the division, which should have been a signed division.
      
      	long timeout = (ct->timeout.expires - jiffies) / HZ;
      	if (timeout < 0)
      		timeout = 0;
      
      This patch converts the time values to signed for the division.
      Signed-off-by: NXi Wang <xi.wang@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c1216382
    • J
      ipvs: try also real server with port 0 in backup server · 52793dbe
      Julian Anastasov 提交于
      	We should not forget to try for real server with port 0
      in the backup server when processing the sync message. We should
      do it in all cases because the backup server can use different
      forwarding method.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      52793dbe
    • E
      netem: fix classful handling · 50612537
      Eric Dumazet 提交于
      Commit 10f6dfcf (Revert "sch_netem: Remove classful functionality")
      reintroduced classful functionality to netem, but broke basic netem
      behavior :
      
      netem uses an t(ime)fifo queue, and store timestamps in skb->cb[]
      
      If qdisc is changed, time constraints are not respected and other qdisc
      can destroy skb->cb[] and block netem at dequeue time.
      
      Fix this by always using internal tfifo, and optionally attach a child
      qdisc to netem (or a tree of qdiscs)
      
      Example of use :
      
      DEV=eth3
      tc qdisc del dev $DEV root
      tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms
      tc qdisc add dev $DEV handle 40:0 parent 30:0 tbf \
      	burst 20480 limit 20480 mtu 1514 rate 32000bps
      
      qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms  10.0ms
       Sent 190792 bytes 413 pkt (dropped 0, overlimits 0 requeues 0)
       rate 18416bit 3pps backlog 0b 0p requeues 0
      qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us
       Sent 190792 bytes 413 pkt (dropped 6, overlimits 10 requeues 0)
       backlog 0b 5p requeues 0
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50612537
    • J
      IPv6: Avoid taking write lock for /proc/net/ipv6_route · 32b293a5
      Josh Hunt 提交于
      During some debugging I needed to look into how /proc/net/ipv6_route
      operated and in my digging I found its calling fib6_clean_all() which uses
      "write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
      found this on 2.6.32, but reading the code I believe the same basic idea
      exists currently. Looking at the rtnetlink code they are only calling
      "read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
      reading from proc isn't the recommended way of fetching the ipv6 route
      table; taking a write lock seems unnecessary and would probably cause
      network performance issues.
      
      To verify this I loaded up the ipv6 route table and then ran iperf in 3
      cases:
        * doing nothing
        * reading ipv6 route table via proc
          (while :; do cat /proc/net/ipv6_route > /dev/null; done)
        * reading ipv6 route table via rtnetlink
          (while :; do ip -6 route show table all > /dev/null; done)
      
      * Load the ipv6 route table up with:
        * for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done
      
      * iperf commands:
        * client: iperf -i 1 -V -c <ipv6 addr>
        * server: iperf -V -s
      
      * iperf results - 3 runs each (in Mbits/sec)
        * nothing: client: 927,927,927 server: 927,927,927
        * proc: client: 179,97,96,113 server: 142,112,133
        * iproute: client: 928,927,928 server: 927,927,927
      
      lock_stat shows taking the write lock is causing the slowdown. Using this
      info I decided to write a version of fib6_clean_all() which replaces
      write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
      this new function I see the same results as with my rtnetlink iperf test.
      Signed-off-by: NJosh Hunt <joshhunt00@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32b293a5
    • P
      unix_diag: Fixup RQLEN extension report · c9da99e6
      Pavel Emelyanov 提交于
      While it's not too late fix the recently added RQLEN diag extension
      to report rqlen and wqlen in the same way as TCP does.
      
      I.e. for listening sockets the ack backlog length (which is the input
      queue length for socket) in rqlen and the max ack backlog length in
      wqlen, and what the CINQ/OUTQ ioctls do for established.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9da99e6
    • P
      af_unix: Move CINQ/COUTQ code to helpers · 885ee74d
      Pavel Emelyanov 提交于
      Currently tcp diag reports rqlen and wqlen values similar to how
      the CINQ/COUTQ iotcls do. To make unix diag report these values
      in the same way move the respective code into helpers.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      885ee74d