1. 29 1月, 2008 1 次提交
  2. 26 1月, 2008 1 次提交
    • R
      IPoIB: improve IPv4/IPv6 to IB mcast mapping functions · a9e527e3
      Rolf Manderscheid 提交于
      An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
      use link-local scope in multicast GIDs.  The existing routines that
      map IP/IPv6 multicast addresses into IB link-level addresses hard-code
      the scope to link-local, and they also leave the partition key field
      uninitialised.  This patch adds a parameter (the link-level broadcast
      address) to the mapping routines, allowing them to initialise both the
      scope and the P_Key appropriately, and fixes up the call sites.
      
      The next step will be to add a way to configure the scope for an IPoIB
      interface.
      Signed-off-by: NRolf Manderscheid <rvm@obsidianresearch.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a9e527e3
  3. 23 1月, 2008 2 次提交
  4. 21 1月, 2008 4 次提交
  5. 10 1月, 2008 1 次提交
  6. 09 1月, 2008 4 次提交
    • B
      [LRO] Fix lro_mgr->features checks · 877364e6
      Brice Goglin 提交于
      lro_mgr->features contains a bitmask of LRO_F_* values which are
      defined as power of two, not as bit indexes.
      They must be checked with x&LRO_F_FOO, not with test_bit(LRO_F_FOO,&x).
      Signed-off-by: NBrice Goglin <Brice.Goglin@inria.fr>
      Acked-by: NAndrew Gallatin <gallatin@myri.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      877364e6
    • E
      [IPV4] ROUTE: ip_rt_dump() is unecessary slow · d8c92830
      Eric Dumazet 提交于
      I noticed "ip route list cache x.y.z.t" can be *very* slow.
      
      While strace-ing -T it I also noticed that first part of route cache
      is fetched quite fast :
      
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
      GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.000047>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
      202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.000042>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
      202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740 <0.000055>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
      202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.000043>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
      202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732 <0.000053>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
      GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708 <0.000052>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
      GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680 <0.000041>
      
      while the part at the end of the table is more expensive:
      
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656 <0.003857>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.003891>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.003765>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700 <0.003879>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676 <0.003797>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724 <0.003856>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.003848>
      
      The following patch corrects this performance/latency problem,
      removing quadratic behavior.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8c92830
    • A
      [IPV4] ipconfig: Fix regression in ip command line processing · 92ffb85d
      Amos Waterland 提交于
      The recent changes for ip command line processing fixed some problems
      but unfortunately broke some common usage scenarios.  In current
      2.6.24-rc6 the following command line results in no IP address
      assignment, which is surely a regression:
      
       ip=10.0.2.15::10.0.2.2:255.255.255.0::eth0:off
      
      Please find below a patch that works for all cases I can find.
      Signed-off-by: NAmos Waterland <apw@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92ffb85d
    • H
      [IPV4] raw: Strengthen check on validity of iph->ihl · f844c74f
      Herbert Xu 提交于
      We currently check that iph->ihl is bounded by the real length and that
      the real length is greater than the minimum IP header length.  However,
      we did not check the caes where iph->ihl is less than the minimum IP
      header length.
      
      This breaks because some ip_fast_csum implementations assume that which
      is quite reasonable.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f844c74f
  7. 04 1月, 2008 1 次提交
    • M
      [INET]: Fix netdev renaming and inet address labels · 44344b2a
      Mark McLoughlin 提交于
      When re-naming an interface, the previous secondary address
      labels get lost e.g.
      
        $> brctl addbr foo
        $> ip addr add 192.168.0.1 dev foo
        $> ip addr add 192.168.0.2 dev foo label foo:00
        $> ip addr show dev foo | grep inet
          inet 192.168.0.1/32 scope global foo
          inet 192.168.0.2/32 scope global foo:00
        $> ip link set foo name bar
        $> ip addr show dev bar | grep inet
          inet 192.168.0.1/32 scope global bar
          inet 192.168.0.2/32 scope global bar:2
      
      Turns out to be a simple thinko in inetdev_changename() - clearly we
      want to look at the address label, rather than the device name, for
      a suffix to retain.
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44344b2a
  8. 30 12月, 2007 1 次提交
    • G
      [TCP]: use non-delayed ACK for congestion control RTT · 2072c228
      Gavin McCullagh 提交于
      When a delayed ACK representing two packets arrives, there are two RTT
      samples available, one for each packet.  The first (in order of seq
      number) will be artificially long due to the delay waiting for the
      second packet, the second will trigger the ACK and so will not itself
      be delayed.
      
      According to rfc1323, the SRTT used for RTO calculation should use the
      first rtt, so receivers echo the timestamp from the first packet in
      the delayed ack.  For congestion control however, it seems measuring
      delayed ack delay is not desirable as it varies independently of
      congestion.
      
      The patch below causes seq_rtt and last_ackt to be updated with any
      available later packet rtts which should have less (and hopefully
      zero) delack delay.  The rtt value then gets passed to
      ca_ops->pkts_acked().
      
      Where TCP_CONG_RTT_STAMP was set, effort was made to supress RTTs from
      within a TSO chunk (!fully_acked), using only the final ACK (which
      includes any TSO delay) to generate RTTs.  This patch removes these
      checks so RTTs are passed for each ACK to ca_ops->pkts_acked().
      
      For non-delay based congestion control (cubic, h-tcp), rtt is
      sometimes used for rtt-scaling.  In shortening the RTT, this may make
      them a little less aggressive.  Delay-based schemes (eg vegas, veno,
      illinois) should get a cleaner, more accurate congestion signal,
      particularly for small cwnds.  The congestion control module can
      potentially also filter out bad RTTs due to the delayed ack alarm by
      looking at the associated cnt which (where delayed acking is in use)
      should probably be 1 if the alarm went off or greater if the ACK was
      triggered by a packet.
      Signed-off-by: NGavin McCullagh <gavin.mccullagh@nuim.ie>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2072c228
  9. 29 12月, 2007 1 次提交
    • S
      [IPV4] Fix ip=dhcp regression · 9cecd07c
      Simon Horman 提交于
      David Brownell pointed out a regression in my recent "Fix ip command
      line processing" patch. It turns out to be a fairly blatant oversight on
      my part whereby ic_enable is never set, and thus autoconfiguration is
      never enabled. Clearly my testing was broken :-(
      
      The solution that I have is to set ic_enable to 1 if we hit
      ip_auto_config_setup(), which basically means that autoconfiguration is
      activated unless told otherwise. I then flip ic_enable to 0 if ip=off,
      ip=none, ip=::::::off or ip=::::::none using ic_proto_name();
      
      The incremental patch is below, let me know if a non-incremental version
      is prepared, as I did as for the original patch to be reverted pending a
      fix.
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cecd07c
  10. 27 12月, 2007 2 次提交
    • S
      [IPV4]: Fix ip command line processing. · a6c05c3d
      Simon Horman 提交于
      Recently the documentation in Documentation/nfsroot.txt was
      update to note that in fact ip=off and ip=::::::off as the
      latter is ignored and the default (on) is used.
      
      This was certainly a step in the direction of reducing confusion.
      But it seems to me that the code ought to be fixed up so that
      ip=::::::off actually turns off ip autoconfiguration.
      
      This patch also notes more specifically that ip=on (aka ip=::::::on)
      is the default.
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6c05c3d
    • P
      [NETFILTER]: nf_conntrack_ipv4: fix module parameter compatibility · fae718dd
      Patrick McHardy 提交于
      Some users do "modprobe ip_conntrack hashsize=...". Since we have the
      module aliases this loads nf_conntrack_ipv4 and nf_conntrack, the
      hashsize parameter is unknown for nf_conntrack_ipv4 however and makes
      it fail.
      
      Allow to specify hashsize= for both nf_conntrack and nf_conntrack_ipv4.
      
      Note: the nf_conntrack message in the ringbuffer will display an
      incorrect hashsize since nf_conntrack is first pulled in as a
      dependency and calculates the size itself, then it gets changed
      through a call to nf_conntrack_set_hashsize().
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fae718dd
  11. 21 12月, 2007 2 次提交
  12. 20 12月, 2007 2 次提交
  13. 17 12月, 2007 1 次提交
  14. 15 12月, 2007 2 次提交
  15. 11 12月, 2007 2 次提交
  16. 07 12月, 2007 2 次提交
  17. 05 12月, 2007 6 次提交
  18. 03 12月, 2007 1 次提交
  19. 29 11月, 2007 2 次提交
    • S
      [TCP] illinois: Incorrect beta usage · a357dde9
      Stephen Hemminger 提交于
      Lachlan Andrew observed that my TCP-Illinois implementation uses the
      beta value incorrectly:
        The parameter  beta  in the paper specifies the amount to decrease
        *by*:  that is, on loss,
           W <-  W -  beta*W
        but in   tcp_illinois_ssthresh() uses  beta  as the amount
        to decrease  *to*: W <- beta*W
      
      This bug makes the Linux TCP-Illinois get less-aggressive on uncongested network,
      hurting performance. Note: since the base beta value is .5, it has no
      impact on a congested network.
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      a357dde9
    • P
      [INET]: Fix inet_diag register vs rcv race · 07693198
      Pavel Emelyanov 提交于
      The following race is possible when one cpu unregisters the handler
      while other one is trying to receive a message and call this one:
      
      CPU1:                                                 CPU2:
      inet_diag_rcv()                                       inet_diag_unregister()
        mutex_lock(&inet_diag_mutex);
        netlink_rcv_skb(skb, &inet_diag_rcv_msg);
          if (inet_diag_table[nlh->nlmsg_type] == 
                                     NULL) /* false handler is still registered */
          ...
          netlink_dump_start(idiagnl, skb, nlh,
                                 inet_diag_dump, NULL);
                 cb = kzalloc(sizeof(*cb), GFP_KERNEL);
                         /* sleep here freeing memory 
                          * or preempt
                          * or sleep later on nlk->cb_mutex
                          */
                                                               spin_lock(&inet_diag_register_lock);
                                                               inet_diag_table[type] = NULL;
          ...                                                  spin_unlock(&inet_diag_register_lock);
                                                               synchronize_rcu();
                                                               /* CPU1 is sleeping - RCU quiescent
                                                                * state is passed
                                                                */
                                                               return;
          /* inet_diag_dump is finally called: */
          inet_diag_dump()
            handler = inet_diag_table[cb->nlh->nlmsg_type];
            BUG_ON(handler == NULL); 
            /* OOPS! While we slept the unregister has set
             * handler to NULL :(
             */
      
      Grep showed, that the register/unregister functions are called
      from init/fini module callbacks for tcp_/dccp_diag, so it's OK
      to use the inet_diag_mutex to synchronize manipulations with the
      inet_diag_table and the access to it.
      
      Besides, as Herbert pointed out, asynchronous dumps should hold 
      this mutex as well, and thus, we provide the mutex as cb_mutex one.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      07693198
  20. 26 11月, 2007 1 次提交
  21. 23 11月, 2007 1 次提交