1. 22 3月, 2008 1 次提交
  2. 21 3月, 2008 2 次提交
    • P
      [TCP]: Fix shrinking windows with window scaling · 607bfbf2
      Patrick McHardy 提交于
      When selecting a new window, tcp_select_window() tries not to shrink
      the offered window by using the maximum of the remaining offered window
      size and the newly calculated window size. The newly calculated window
      size is always a multiple of the window scaling factor, the remaining
      window size however might not be since it depends on rcv_wup/rcv_nxt.
      This means we're effectively shrinking the window when scaling it down.
      
      
      The dump below shows the problem (scaling factor 2^7):
      
      - Window size of 557 (71296) is advertised, up to 3111907257:
      
      IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>
      
      - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
        below the last end:
      
      IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>
      
      The number 40 results from downscaling the remaining window:
      
      3111907257 - 3111841425 = 65832
      65832 / 2^7 = 514
      65832 % 2^7 = 40
      
      If the sender uses up the entire window before it is shrunk, this can have
      chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
      will notice that the window has been shrunk since tcp_wnd_end() is before
      tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
      This will fail the receivers checks in tcp_sequence() however since it
      is before it's tp->rcv_wup, making it respond with a dupack.
      
      If both sides are in this condition, this leads to a constant flood of
      ACKs until the connection times out.
      
      Make sure the window is never shrunk by aligning the remaining window to
      the window scaling factor.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      607bfbf2
    • D
      [NETFILTER]: ipt_recent: sanity check hit count · d0ebf133
      Daniel Hokka Zakrisson 提交于
      If a rule using ipt_recent is created with a hit count greater than
      ip_pkt_list_tot, the rule will never match as it cannot keep track
      of enough timestamps. This patch makes ipt_recent refuse to create such
      rules.
      
      With ip_pkt_list_tot's default value of 20, the following can be used
      to reproduce the problem.
      
      nc -u -l 0.0.0.0 1234 &
      for i in `seq 1 100`; do echo $i | nc -w 1 -u 127.0.0.1 1234; done
      
      This limits it to 20 packets:
      iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
               --rsource
      iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
               60 --hitcount 20 --name test --rsource -j DROP
      
      While this is unlimited:
      iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
               --rsource
      iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
               60 --hitcount 21 --name test --rsource -j DROP
      
      With the patch the second rule-set will throw an EINVAL.
      Reported-by: NSean Kennedy <skennedy@vcn.com>
      Signed-off-by: NDaniel Hokka Zakrisson <daniel@hozac.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0ebf133
  3. 18 3月, 2008 2 次提交
  4. 12 3月, 2008 1 次提交
  5. 05 3月, 2008 2 次提交
    • S
      [IPCONFIG]: The kernel gets no IP from some DHCP servers · dea75bdf
      Stephen Hemminger 提交于
      From: Stephen Hemminger <shemminger@linux-foundation.org>
      
      Based upon a patch by Marcel Wappler:
       
         This patch fixes a DHCP issue of the kernel: some DHCP servers
         (i.e.  in the Linksys WRT54Gv5) are very strict about the contents
         of the DHCPDISCOVER packet they receive from clients.
       
         Table 5 in RFC2131 page 36 requests the fields 'ciaddr' and
         'siaddr' MUST be set to '0'.  These DHCP servers ignore Linux
         kernel's DHCP discovery packets with these two fields set to
         '255.255.255.255' (in contrast to popular DHCP clients, such as
         'dhclient' or 'udhcpc').  This leads to a not booting system.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dea75bdf
    • H
      [ESP]: Add select on AUTHENC · ed58dd41
      Herbert Xu 提交于
      Now the ESP uses the AEAD interface even for algorithms which are
      not combined mode, we need to select CONFIG_CRYPTO_AUTHENC as
      otherwise only combined mode algorithms will work.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed58dd41
  6. 04 3月, 2008 1 次提交
  7. 29 2月, 2008 3 次提交
  8. 27 2月, 2008 2 次提交
    • P
      [INET]: Don't create tunnels with '%' in name. · b37d428b
      Pavel Emelyanov 提交于
      Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a
      pre-defined name for a device from the userspace.  Since these drivers
      call the register_netdevice() (rtnl_lock, is held), which does _not_
      generate the device's name, this name may contain a '%' character.
      
      Not sure how bad is this to have a device with a '%' in its name, but
      all the other places either use the register_netdev(), which call the
      dev_alloc_name(), or explicitly call the dev_alloc_name() before
      registering, i.e. do not allow for such names.
      
      This had to be prior to the commit 34cc7b, but I forgot to number the
      patches and this one got lost, sorry.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b37d428b
    • B
      [IPV4]: Reset scope when changing address · 148f9729
      Bjorn Mork 提交于
      This bug did bite at least one user, who did have to resort to rebooting
      the system after an "ifconfig eth0 127.0.0.1" typo.
      
      Deleting the address and adding a new is a less intrusive workaround.
      But I still beleive this is a bug that should be fixed.  Some way or
      another.
      
      Another possibility would be to remove the scope mangling based on
      address.  This will always be incomplete (are 127/8 the only address
      space with host scope requirements?)
      
      We set the scope to RT_SCOPE_HOST if an IPv4 interface is configured
      with a loopback address (127/8).  The scope is never reset, and will
      remain set to RT_SCOPE_HOST after changing the address. This patch
      resets the scope if the address is changed again, to restore normal
      functionality.
      Signed-off-by: NBjorn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      148f9729
  9. 24 2月, 2008 1 次提交
  10. 20 2月, 2008 3 次提交
  11. 18 2月, 2008 3 次提交
  12. 14 2月, 2008 2 次提交
  13. 13 2月, 2008 5 次提交
    • H
      [IPSEC]: Fix bogus usage of u64 on input sequence number · b318e0e4
      Herbert Xu 提交于
      Al Viro spotted a bogus use of u64 on the input sequence number which
      is big-endian.  This patch fixes it by giving the input sequence number
      its own member in the xfrm_skb_cb structure.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b318e0e4
    • D
      [NDISC]: Fix race in generic address resolution · 69cc64d8
      David S. Miller 提交于
      Frank Blaschka provided the bug report and the initial suggested fix
      for this bug.  He also validated this version of this fix.
      
      The problem is that the access to neigh->arp_queue is inconsistent, we
      grab references when dropping the lock lock to call
      neigh->ops->solicit() but this does not prevent other threads of
      control from trying to send out that packet at the same time causing
      corruptions because both code paths believe they have exclusive access
      to the skb.
      
      The best option seems to be to hold the write lock on neigh->lock
      during the ->solicit() call.  I looked at all of the ndisc_ops
      implementations and this seems workable.  The only case that needs
      special care is the IPV4 ARP implementation of arp_solicit().  It
      wants to take neigh->lock as a reader to protect the header entry in
      neigh->ha during the emission of the soliciation.  We can simply
      remove the read lock calls to take care of that since holding the lock
      as a writer at the caller providers a superset of the protection
      afforded by the existing read locking.
      
      The rest of the ->solicit() implementations don't care whether the
      neigh is locked or not.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69cc64d8
    • S
      fib_trie: /proc/net/route performance improvement · 8315f5d8
      Stephen Hemminger 提交于
      Use key/offset caching to change /proc/net/route (use by iputils route)
      from O(n^2) to O(n). This improves performance from 30sec with 160,000
      routes to 1sec.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8315f5d8
    • S
      fib_trie: handle empty tree · ec28cf73
      Stephen Hemminger 提交于
      This fixes possible problems when trie_firstleaf() returns NULL
      to trie_leafindex().
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec28cf73
    • D
      [IPV4]: Remove IP_TOS setting privilege checks. · e4f8b5d4
      David S. Miller 提交于
      Various RFCs have all sorts of things to say about the CS field of the
      DSCP value.  In particular they try to make the distinction between
      values that should be used by "user applications" and things like
      routing daemons.
      
      This seems to have influenced the CAP_NET_ADMIN check which exists for
      IP_TOS socket option settings, but in fact it has an off-by-one error
      so it wasn't allowing CS5 which is meant for "user applications" as
      well.
      
      Further adding to the inconsistency and brokenness here, IPV6 does not
      validate the DSCP values specified for the IPV6_TCLASS socket option.
      
      The real actual uses of these TOS values are system specific in the
      final analysis, and these RFC recommendations are just that, "a
      recommendation".  In fact the standards very purposefully use
      "SHOULD" and "SHOULD NOT" when describing how these values can be
      used.
      
      In the final analysis the only clean way to provide consistency here
      is to remove the CAP_NET_ADMIN check.  The alternatives just don't
      work out:
      
      1) If we add the CAP_NET_ADMIN check to ipv6, this can break existing
         setups.
      
      2) If we just fix the off-by-one error in the class comparison in
         IPV4, certain DSCP values can be used in IPV6 but not IPV4 by
         default.  So people will just ask for a sysctl asking to
         override that.
      
      I checked several other freely available kernel trees and they
      do not make any privilege checks in this area like we do.  For
      the BSD stacks, this goes back all the way to Stevens Volume 2
      and beyond.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4f8b5d4
  14. 10 2月, 2008 1 次提交
  15. 08 2月, 2008 2 次提交
  16. 06 2月, 2008 2 次提交
  17. 05 2月, 2008 4 次提交
  18. 03 2月, 2008 1 次提交
    • A
      [SOCK] proto: Add hashinfo member to struct proto · ab1e0a13
      Arnaldo Carvalho de Melo 提交于
      This way we can remove TCP and DCCP specific versions of
      
      sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
      sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                             a specific version to deal with mapped sockets
      sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly
      
      struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
      that inet_csk_get_port can find the per family routine.
      
      Now only the lookup routines receive as a parameter a struct inet_hashtable.
      
      With this we further reuse code, reducing the difference among INET transport
      protocols.
      
      Eventually work has to be done on UDP and SCTP to make them share this
      infrastructure and get as a bonus inet_diag interfaces so that iproute can be
      used with these protocols.
      
      net-2.6/net/ipv4/inet_hashtables.c:
        struct proto			     |   +8
        struct inet_connection_sock_af_ops |   +8
       2 structs changed
        __inet_hash_nolisten               |  +18
        __inet_hash                        | -210
        inet_put_port                      |   +8
        inet_bind_bucket_create            |   +1
        __inet_hash_connect                |   -8
       5 functions changed, 27 bytes added, 218 bytes removed, diff: -191
      
      net-2.6/net/core/sock.c:
        proto_seq_show                     |   +3
       1 function changed, 3 bytes added, diff: +3
      
      net-2.6/net/ipv4/inet_connection_sock.c:
        inet_csk_get_port                  |  +15
       1 function changed, 15 bytes added, diff: +15
      
      net-2.6/net/ipv4/tcp.c:
        tcp_set_state                      |   -7
       1 function changed, 7 bytes removed, diff: -7
      
      net-2.6/net/ipv4/tcp_ipv4.c:
        tcp_v4_get_port                    |  -31
        tcp_v4_hash                        |  -48
        tcp_v4_destroy_sock                |   -7
        tcp_v4_syn_recv_sock               |   -2
        tcp_unhash                         | -179
       5 functions changed, 267 bytes removed, diff: -267
      
      net-2.6/net/ipv6/inet6_hashtables.c:
        __inet6_hash |   +8
       1 function changed, 8 bytes added, diff: +8
      
      net-2.6/net/ipv4/inet_hashtables.c:
        inet_unhash                        | +190
        inet_hash                          | +242
       2 functions changed, 432 bytes added, diff: +432
      
      vmlinux:
       16 functions changed, 485 bytes added, 492 bytes removed, diff: -7
      
      /home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
        tcp_v6_get_port                    |  -31
        tcp_v6_hash                        |   -7
        tcp_v6_syn_recv_sock               |   -9
       3 functions changed, 47 bytes removed, diff: -47
      
      /home/acme/git/net-2.6/net/dccp/proto.c:
        dccp_destroy_sock                  |   -7
        dccp_unhash                        | -179
        dccp_hash                          |  -49
        dccp_set_state                     |   -7
        dccp_done                          |   +1
       5 functions changed, 1 bytes added, 242 bytes removed, diff: -241
      
      /home/acme/git/net-2.6/net/dccp/ipv4.c:
        dccp_v4_get_port                   |  -31
        dccp_v4_request_recv_sock          |   -2
       2 functions changed, 33 bytes removed, diff: -33
      
      /home/acme/git/net-2.6/net/dccp/ipv6.c:
        dccp_v6_get_port                   |  -31
        dccp_v6_hash                       |   -7
        dccp_v6_request_recv_sock          |   +5
       3 functions changed, 5 bytes added, 38 bytes removed, diff: -33
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab1e0a13
  19. 01 2月, 2008 2 次提交