1. 12 2月, 2011 1 次提交
  2. 04 2月, 2011 2 次提交
  3. 01 2月, 2011 2 次提交
    • P
      netfilter: arpt_mangle: fix return values of checkentry · 9d0db8b6
      Pablo Neira Ayuso 提交于
      In 135367b8 "netfilter: xtables: change xt_target.checkentry return type",
      the type returned by checkentry was changed from boolean to int, but the
      return values where not adjusted.
      
      arptables: Input/output error
      
      This broke arptables with the mangle target since it returns true
      under success, which is interpreted by xtables as >0, thus
      returning EIO.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      9d0db8b6
    • R
      net: Add default_mtu() methods to blackhole dst_ops · ec831ea7
      Roland Dreier 提交于
      When an IPSEC SA is still being set up, __xfrm_lookup() will return
      -EREMOTE and so ip_route_output_flow() will return a blackhole route.
      This can happen in a sndmsg call, and after d33e4553 ("net: Abstract
      default MTU metric calculation behind an accessor.") this leads to a
      crash in ip_append_data() because the blackhole dst_ops have no
      default_mtu() method and so dst_mtu() calls a NULL pointer.
      
      Fix this by adding default_mtu() methods (that simply return 0, matching
      the old behavior) to the blackhole dst_ops.
      
      The IPv4 part of this patch fixes a crash that I saw when using an IPSEC
      VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it
      looks to be needed as well.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec831ea7
  4. 30 1月, 2011 1 次提交
    • E
      net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT · 709b46e8
      Eric W. Biederman 提交于
      SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1,
      which unfortunately means the existing infrastructure for compat networking
      ioctls is insufficient.  A trivial compact ioctl implementation would conflict
      with:
      
      SIOCAX25ADDUID
      SIOCAIPXPRISLT
      SIOCGETSGCNT_IN6
      SIOCGETSGCNT
      SIOCRSSCAUSE
      SIOCX25SSUBSCRIP
      SIOCX25SDTEFACILITIES
      
      To make this work I have updated the compat_ioctl decode path to mirror the
      the normal ioctl decode path.  I have added an ipv4 inet_compat_ioctl function
      so that I can have ipv4 specific compat ioctls.   I have added a compat_ioctl
      function into struct proto so I can break out ioctls by which kind of ip socket
      I am using.  I have added a compat_raw_ioctl function because SIOCGETSGCNT only
      works on raw sockets.  I have added a ipmr_compat_ioctl that mirrors the normal
      ipmr_ioctl.
      
      This was necessary because unfortunately the struct layout for the SIOCGETSGCNT
      has unsigned longs in it so changes between 32bit and 64bit kernels.
      
      This change was sufficient to run a 32bit ip multicast routing daemon on a
      64bit kernel.
      Reported-by: NBill Fenner <fenner@aristanetworks.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      709b46e8
  5. 26 1月, 2011 1 次提交
    • J
      TCP: fix a bug that triggers large number of TCP RST by mistake · 44f5324b
      Jerry Chu 提交于
      This patch fixes a bug that causes TCP RST packets to be generated
      on otherwise correctly behaved applications, e.g., no unread data
      on close,..., etc. To trigger the bug, at least two conditions must
      be met:
      
      1. The FIN flag is set on the last data packet, i.e., it's not on a
      separate, FIN only packet.
      2. The size of the last data chunk on the receive side matches
      exactly with the size of buffer posted by the receiver, and the
      receiver closes the socket without any further read attempt.
      
      This bug was first noticed on our netperf based testbed for our IW10
      proposal to IETF where a large number of RST packets were observed.
      netperf's read side code meets the condition 2 above 100%.
      
      Before the fix, tcp_data_queue() will queue the last skb that meets
      condition 1 to sk_receive_queue even though it has fully copied out
      (skb_copy_datagram_iovec()) the data. Then if condition 2 is also met,
      tcp_recvmsg() often returns all the copied out data successfully
      without actually consuming the skb, due to a check
      "if ((chunk = len - tp->ucopy.len) != 0) {"
      and
      "len -= chunk;"
      after tcp_prequeue_process() that causes "len" to become 0 and an
      early exit from the big while loop.
      
      I don't see any reason not to free the skb whose data have been fully
      consumed in tcp_data_queue(), regardless of the FIN flag.  We won't
      get there if MSG_PEEK is on. Am I missing some arcane cases related
      to urgent data?
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44f5324b
  6. 25 1月, 2011 3 次提交
  7. 20 1月, 2011 1 次提交
  8. 12 1月, 2011 2 次提交
  9. 11 1月, 2011 2 次提交
  10. 10 1月, 2011 1 次提交
  11. 07 1月, 2011 2 次提交
    • P
      netfilter: fix export secctx error handling · cba85b53
      Pablo Neira Ayuso 提交于
      In 1ae4de0c, the secctx was exported
      via the /proc/net/netfilter/nf_conntrack and ctnetlink interfaces
      instead of the secmark.
      
      That patch introduced the use of security_secid_to_secctx() which may
      return a non-zero value on error.
      
      In one of my setups, I have NF_CONNTRACK_SECMARK enabled but no
      security modules. Thus, security_secid_to_secctx() returns a negative
      value that results in the breakage of the /proc and `conntrack -L'
      outputs. To fix this, we skip the inclusion of secctx if the
      aforementioned function fails.
      
      This patch also fixes the dynamic netlink message size calculation
      if security_secid_to_secctx() returns an error, since its logic is
      also wrong.
      
      This problem exists in Linux kernel >= 2.6.37.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cba85b53
    • E
      ipv4: IP defragmentation must be ECN aware · 6623e3b2
      Eric Dumazet 提交于
      RFC3168 (The Addition of Explicit Congestion Notification to IP)
      states :
      
      5.3.  Fragmentation
      
         ECN-capable packets MAY have the DF (Don't Fragment) bit set.
         Reassembly of a fragmented packet MUST NOT lose indications of
         congestion.  In other words, if any fragment of an IP packet to be
         reassembled has the CE codepoint set, then one of two actions MUST be
         taken:
      
            * Set the CE codepoint on the reassembled packet.  However, this
              MUST NOT occur if any of the other fragments contributing to
              this reassembly carries the Not-ECT codepoint.
      
            * The packet is dropped, instead of being reassembled, for any
              other reason.
      
      This patch implements this requirement for IPv4, choosing the first
      action :
      
      If one fragment had NO-ECT codepoint
              reassembled frame has NO-ECT
      ElIf one fragment had CE codepoint
              reassembled frame has CE
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6623e3b2
  12. 05 1月, 2011 1 次提交
    • J
      ipv4/route.c: respect prefsrc for local routes · 9fc3bbb4
      Joel Sing 提交于
      The preferred source address is currently ignored for local routes,
      which results in all local connections having a src address that is the
      same as the local dst address. Fix this by respecting the preferred source
      address when it is provided for local routes.
      
      This bug can be demonstrated as follows:
      
       # ifconfig dummy0 192.168.0.1
       # ip route show table local | grep local.*dummy0
       local 192.168.0.1 dev dummy0  proto kernel  scope host  src 192.168.0.1
       # ip route change table local local 192.168.0.1 dev dummy0 \
           proto kernel scope host src 127.0.0.1
       # ip route show table local | grep local.*dummy0
       local 192.168.0.1 dev dummy0  proto kernel  scope host  src 127.0.0.1
      
      We now establish a local connection and verify the source IP
      address selection:
      
       # nc -l 192.168.0.1 3128 &
       # nc 192.168.0.1 3128 &
       # netstat -ant | grep 192.168.0.1:3128.*EST
       tcp        0      0 192.168.0.1:3128        192.168.0.1:33228 ESTABLISHED
       tcp        0      0 192.168.0.1:33228       192.168.0.1:3128  ESTABLISHED
      Signed-off-by: NJoel Sing <jsing@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fc3bbb4
  13. 26 12月, 2010 1 次提交
  14. 24 12月, 2010 3 次提交
  15. 21 12月, 2010 2 次提交
  16. 17 12月, 2010 2 次提交
  17. 15 12月, 2010 1 次提交
  18. 14 12月, 2010 2 次提交
  19. 13 12月, 2010 2 次提交
    • D
      ipv4: Don't pre-seed hoplimit metric. · 323e126f
      David S. Miller 提交于
      Always go through a new ip4_dst_hoplimit() helper, just like ipv6.
      
      This allowed several simplifications:
      
      1) The interim dst_metric_hoplimit() can go as it's no longer
         userd.
      
      2) The sysctl_ip_default_ttl entry no longer needs to use
         ipv4_doint_and_flush, since the sysctl is not cached in
         routing cache metrics any longer.
      
      3) ipv4_doint_and_flush no longer needs to be exported and
         therefore can be marked static.
      
      When ipv4_doint_and_flush_strategy was removed some time ago,
      the external declaration in ip.h was mistakenly left around
      so kill that off too.
      
      We have to move the sysctl_ip_default_ttl declaration into
      ipv4's route cache definition header net/route.h, because
      currently net/ip.h (where the declaration lives now) has
      a back dependency on net/route.h
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      323e126f
    • D
      5170ae82
  20. 11 12月, 2010 1 次提交
  21. 10 12月, 2010 2 次提交
    • E
      net: optimize INET input path further · 68835aba
      Eric Dumazet 提交于
      Followup of commit b178bb3d (net: reorder struct sock fields)
      
      Optimize INET input path a bit further, by :
      
      1) moving sk_refcnt close to sk_lock.
      
      This reduces number of dirtied cache lines by one on 64bit arches (and
      64 bytes cache line size).
      
      2) moving inet_daddr & inet_rcv_saddr at the beginning of sk
      
      (same cache line than hash / family / bound_dev_if / nulls_node)
      
      This reduces number of accessed cache lines in lookups by one, and dont
      increase size of inet and timewait socks.
      inet and tw sockets now share same place-holder for these fields.
      
      Before patch :
      
      offsetof(struct sock, sk_refcnt) = 0x10
      offsetof(struct sock, sk_lock) = 0x40
      offsetof(struct sock, sk_receive_queue) = 0x60
      offsetof(struct inet_sock, inet_daddr) = 0x270
      offsetof(struct inet_sock, inet_rcv_saddr) = 0x274
      
      After patch :
      
      offsetof(struct sock, sk_refcnt) = 0x44
      offsetof(struct sock, sk_lock) = 0x48
      offsetof(struct sock, sk_receive_queue) = 0x68
      offsetof(struct inet_sock, inet_daddr) = 0x0
      offsetof(struct inet_sock, inet_rcv_saddr) = 0x4
      
      compute_score() (udp or tcp) now use a single cache line per ignored
      item, instead of two.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68835aba
    • D
      net: Abstract away all dst_entry metrics accesses. · defb3519
      David S. Miller 提交于
      Use helper functions to hide all direct accesses, especially writes,
      to dst_entry metrics values.
      
      This will allow us to:
      
      1) More easily change how the metrics are stored.
      
      2) Implement COW for metrics.
      
      In particular this will help us put metrics into the inetpeer
      cache if that is what we end up doing.  We can make the _metrics
      member a pointer instead of an array, initially have it point
      at the read-only metrics in the FIB, and then on the first set
      grab an inetpeer entry and point the _metrics member there.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      defb3519
  22. 09 12月, 2010 5 次提交