1. 09 2月, 2010 4 次提交
    • P
      netfilter: nf_conntrack: fix hash resizing with namespaces · d696c7bd
      Patrick McHardy 提交于
      As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
      size is global and not per namespace, but modifiable at runtime through
      /sys/module/nf_conntrack/hashsize. Changing the hash size will only
      resize the hash in the current namespace however, so other namespaces
      will use an invalid hash size. This can cause crashes when enlarging
      the hashsize, or false negative lookups when shrinking it.
      
      Move the hash size into the per-namespace data and only use the global
      hash size to initialize the per-namespace value when instanciating a
      new namespace. Additionally restrict hash resizing to init_net for
      now as other namespaces are not handled currently.
      
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d696c7bd
    • A
      netfilter: nf_conntrack: restrict runtime expect hashsize modifications · 13ccdfc2
      Alexey Dobriyan 提交于
      Expectation hashtable size was simply glued to a variable with no code
      to rehash expectations, so it was a bug to allow writing to it.
      Make "expect_hashsize" readonly.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      13ccdfc2
    • E
      netfilter: nf_conntrack: per netns nf_conntrack_cachep · 5b3501fa
      Eric Dumazet 提交于
      nf_conntrack_cachep is currently shared by all netns instances, but
      because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.
      
      If we use a shared slab cache, one object can instantly flight between
      one hash table (netns ONE) to another one (netns TWO), and concurrent
      reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
      can be fooled without notice, because no RCU grace period has to be
      observed between object freeing and its reuse.
      
      We dont have this problem with UDP/TCP slab caches because TCP/UDP
      hashtables are global to the machine (and each object has a pointer to
      its netns).
      
      If we use per netns conntrack hash tables, we also *must* use per netns
      conntrack slab caches, to guarantee an object can not escape from one
      namespace to another one.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      [Patrick: added unique slab name allocation]
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      5b3501fa
    • P
      netfilter: nf_conntrack: fix memory corruption with multiple namespaces · 9edd7ca0
      Patrick McHardy 提交于
      As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked"
      conntrack, which is located in the data section, might be accidentally
      freed when a new namespace is instantiated while the untracked conntrack
      is attached to a skb because the reference count it re-initialized.
      
      The best fix would be to use a seperate untracked conntrack per
      namespace since it includes a namespace pointer. Unfortunately this is
      not possible without larger changes since the namespace is not easily
      available everywhere we need it. For now move the untracked conntrack
      initialization to the init_net setup function to make sure the reference
      count is not re-initialized and handle cleanup in the init_net cleanup
      function to make sure namespaces can exit properly while the untracked
      conntrack is in use in other namespaces.
      
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9edd7ca0
  2. 27 1月, 2010 1 次提交
  3. 20 1月, 2010 1 次提交
  4. 08 1月, 2010 1 次提交
  5. 04 1月, 2010 1 次提交
  6. 22 12月, 2009 1 次提交
  7. 16 12月, 2009 2 次提交
    • A
      tree-wide: convert open calls to remove spaces to skip_spaces() lib function · e7d2860b
      André Goddard Rosa 提交于
      Makes use of skip_spaces() defined in lib/string.c for removing leading
      spaces from strings all over the tree.
      
      It decreases lib.a code size by 47 bytes and reuses the function tree-wide:
         text    data     bss     dec     hex filename
        64688     584     592   65864   10148 (TOTALS-BEFORE)
        64641     584     592   65817   10119 (TOTALS-AFTER)
      
      Also, while at it, if we see (*str && isspace(*str)), we can be sure to
      remove the first condition (*str) as the second one (isspace(*str)) also
      evaluates to 0 whenever *str == 0, making it redundant. In other words,
      "a char equals zero is never a space".
      
      Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below,
      and found occurrences of this pattern on 3 more files:
          drivers/leds/led-class.c
          drivers/leds/ledtrig-timer.c
          drivers/video/output.c
      
      @@
      expression str;
      @@
      
      ( // ignore skip_spaces cases
      while (*str &&  isspace(*str)) { \(str++;\|++str;\) }
      |
      - *str &&
      isspace(*str)
      )
      Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7d2860b
    • S
      ipvs: zero usvc and udest · 258c8893
      Simon Horman 提交于
      Make sure that any otherwise uninitialised fields of usvc are zero.
      
      This has been obvserved to cause a problem whereby the port of
      fwmark services may end up as a non-zero value which causes
      scheduling of a destination server to fail for persisitent services.
      
      As observed by Deon van der Merwe <dvdm@truteq.co.za>.
      This fix suggested by Julian Anastasov <ja@ssi.bg>.
      
      For good measure also zero udest.
      
      Cc: Deon van der Merwe <dvdm@truteq.co.za>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      258c8893
  8. 14 12月, 2009 1 次提交
    • X
      ipvs: fix synchronization on connection close · 9abfe315
      Xiaotian Feng 提交于
      commit 9d3a0de7 makes slaves expire as they would do on the master
      with much shorter timeouts. But it introduces another problem:
      When we close a connection, on master server the connection became
      CLOSE_WAIT/TIME_WAIT, it was synced to slaves, but if master is
      finished within it's timeouts (CLOSE), it will not be synced to
      slaves. Then slaves will be kept on CLOSE_WAIT/TIME_WAIT until
      timeout reaches. Thus we should also sync with CLOSE.
      
      Cc: Wensong Zhang <wensong@linux-vs.org>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      9abfe315
  9. 02 12月, 2009 2 次提交
  10. 30 11月, 2009 1 次提交
  11. 26 11月, 2009 1 次提交
  12. 23 11月, 2009 3 次提交
    • P
      netfilter: xt_limit: fix invalid return code in limit_mt_check() · 8fa539bd
      Patrick McHardy 提交于
      Commit acc738fe (netfilter: xtables: avoid pointer to self) introduced
      an invalid return value in limit_mt_check().
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      8fa539bd
    • F
      netfilter: xtables: fix conntrack match v1 ipt-save output · 3a042929
      Florian Westphal 提交于
      commit d6d3f08b
      (netfilter: xtables: conntrack match revision 2) does break the
      v1 conntrack match iptables-save output in a subtle way.
      
      Problem is as follows:
      
          up = kmalloc(sizeof(*up), GFP_KERNEL);
      [..]
         /*
          * The strategy here is to minimize the overhead of v1 matching,
          * by prebuilding a v2 struct and putting the pointer into the
          * v1 dataspace.
          */
          memcpy(up, info, offsetof(typeof(*info), state_mask));
      [..]
          *(void **)info  = up;
      
      As the v2 struct pointer is saved in the match data space,
      it clobbers the first structure member (->origsrc_addr).
      
      Because the _v1 match function grabs this pointer and does not actually
      look at the v1 origsrc, run time functionality does not break.
      But iptables -nvL (or iptables-save) cannot know that v1 origsrc_addr
      has been overloaded in this way:
      
      $ iptables -p tcp -A OUTPUT -m conntrack --ctorigsrc 10.0.0.1 -j ACCEPT
      $ iptables-save
      -A OUTPUT -p tcp -m conntrack --ctorigsrc 128.173.134.206 -j ACCEPT
      
      (128.173... is the address to the v2 match structure).
      
      To fix this, we take advantage of the fact that the v1 and v2 structures
      are identical with exception of the last two structure members (u8 in v1,
      u16 in v2).
      
      We extract them as early as possible and prevent the v2 matching function
      from looking at those two members directly.
      
      Previously reported by Michel Messerschmidt via Ben Hutchings, also
      see Debian Bug tracker #556587.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      3a042929
    • P
      netfilter: nf_ct_tcp: improve out-of-sync situation in TCP tracking · c4832c7b
      Pablo Neira Ayuso 提交于
      Without this patch, if we receive a SYN packet from the client while
      the firewall is out-of-sync, we let it go through. Then, if we see
      the SYN/ACK reply coming from the server, we destroy the conntrack
      entry and drop the packet to trigger a new retransmission. Then,
      the retransmision from the client is used to start a new clean
      session.
      
      This patch improves the current handling. Basically, if we see an
      unexpected SYN packet, we annotate the TCP options. Then, if we
      see the reply SYN/ACK, this means that the firewall was indeed
      out-of-sync. Therefore, we set a clean new session from the existing
      entry based on the annotated values.
      
      This patch adds two new 8-bits fields that fit in a 16-bits gap of
      the ip_ct_tcp structure.
      
      This patch is particularly useful for conntrackd since the
      asynchronous nature of the state-synchronization allows to have
      backup nodes that are not perfect copies of the master. This helps
      to improve the recovery under some worst-case scenarios.
      
      I have tested this by creating lots of conntrack entries in wrong
      state:
      
      for ((i=1024;i<65535;i++)); do conntrack -I -p tcp -s 192.168.2.101 -d 192.168.2.2 --sport $i --dport 80 -t 800 --state ESTABLISHED -u ASSURED,SEEN_REPLY; done
      
      Then, I make some TCP connections:
      
      $ echo GET / | nc 192.168.2.2 80
      
      The events show the result:
      
       [UPDATE] tcp      6 60 SYN_RECV src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
       [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
       [UPDATE] tcp      6 120 FIN_WAIT src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
       [UPDATE] tcp      6 30 LAST_ACK src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
       [UPDATE] tcp      6 120 TIME_WAIT src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
      
      and tcpdump shows no retransmissions:
      
      20:47:57.271951 IP 192.168.2.101.33221 > 192.168.2.2.www: S 435402517:435402517(0) win 5840 <mss 1460,sackOK,timestamp 4294961827 0,nop,wscale 6>
      20:47:57.273538 IP 192.168.2.2.www > 192.168.2.101.33221: S 3509927945:3509927945(0) ack 435402518 win 5792 <mss 1460,sackOK,timestamp 235681024 4294961827,nop,wscale 4>
      20:47:57.273608 IP 192.168.2.101.33221 > 192.168.2.2.www: . ack 3509927946 win 92 <nop,nop,timestamp 4294961827 235681024>
      20:47:57.273693 IP 192.168.2.101.33221 > 192.168.2.2.www: P 435402518:435402524(6) ack 3509927946 win 92 <nop,nop,timestamp 4294961827 235681024>
      20:47:57.275492 IP 192.168.2.2.www > 192.168.2.101.33221: . ack 435402524 win 362 <nop,nop,timestamp 235681024 4294961827>
      20:47:57.276492 IP 192.168.2.2.www > 192.168.2.101.33221: P 3509927946:3509928082(136) ack 435402524 win 362 <nop,nop,timestamp 235681025 4294961827>
      20:47:57.276515 IP 192.168.2.101.33221 > 192.168.2.2.www: . ack 3509928082 win 108 <nop,nop,timestamp 4294961828 235681025>
      20:47:57.276521 IP 192.168.2.2.www > 192.168.2.101.33221: F 3509928082:3509928082(0) ack 435402524 win 362 <nop,nop,timestamp 235681025 4294961827>
      20:47:57.277369 IP 192.168.2.101.33221 > 192.168.2.2.www: F 435402524:435402524(0) ack 3509928083 win 108 <nop,nop,timestamp 4294961828 235681025>
      20:47:57.279491 IP 192.168.2.2.www > 192.168.2.101.33221: . ack 435402525 win 362 <nop,nop,timestamp 235681025 4294961828>
      
      I also added a rule to log invalid packets, with no occurrences  :-) .
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      c4832c7b
  13. 20 11月, 2009 2 次提交
  14. 18 11月, 2009 1 次提交
  15. 13 11月, 2009 2 次提交
  16. 12 11月, 2009 1 次提交
    • E
      sysctl net: Remove unused binary sysctl code · f8572d8f
      Eric W. Biederman 提交于
      Now that sys_sysctl is a compatiblity wrapper around /proc/sys
      all sysctl strategy routines, and all ctl_name and strategy
      entries in the sysctl tables are unused, and can be
      revmoed.
      
      In addition neigh_sysctl_register has been modified to no longer
      take a strategy argument and it's callers have been modified not
      to pass one.
      
      Cc: "David Miller" <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      f8572d8f
  17. 07 11月, 2009 2 次提交
  18. 06 11月, 2009 1 次提交
    • J
      netfilter: nf_nat: fix NAT issue in 2.6.30.4+ · f9dd09c7
      Jozsef Kadlecsik 提交于
      Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work
      over NAT. The "cause" of the problem was a fix of unacknowledged data
      detection with NAT (commit a3a9f79e).
      However, actually, that fix uncovered a long standing bug in TCP conntrack:
      when NAT was enabled, we simply updated the max of the right edge of
      the segments we have seen (td_end), by the offset NAT produced with
      changing IP/port in the data. However, we did not update the other parameter
      (td_maxend) which is affected by the NAT offset. Thus that could drift
      away from the correct value and thus resulted breaking active FTP.
      
      The patch below fixes the issue by *not* updating the conntrack parameters
      from NAT, but instead taking into account the NAT offsets in conntrack in a
      consistent way. (Updating from NAT would be more harder and expensive because
      it'd need to re-calculate parameters we already calculated in conntrack.)
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9dd09c7
  19. 05 11月, 2009 1 次提交
  20. 29 10月, 2009 1 次提交
  21. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  22. 12 10月, 2009 1 次提交
  23. 01 10月, 2009 1 次提交
  24. 24 9月, 2009 1 次提交
  25. 22 9月, 2009 1 次提交
  26. 31 8月, 2009 3 次提交
  27. 25 8月, 2009 1 次提交
  28. 24 8月, 2009 1 次提交