1. 20 6月, 2008 2 次提交
  2. 19 6月, 2008 1 次提交
  3. 18 6月, 2008 7 次提交
    • P
      netlink: genl: fix circular locking · 6d1a3fb5
      Patrick McHardy 提交于
      genetlink has a circular locking dependency when dumping the registered
      families:
      
      - dump start:
      genl_rcv()            : take genl_mutex
      genl_rcv_msg()        : call netlink_dump_start() while holding genl_mutex
      netlink_dump_start(),
      netlink_dump()        : take nlk->cb_mutex
      ctrl_dumpfamily()     : try to detect this case and not take genl_mutex a
                              second time
      
      - dump continuance:
      netlink_rcv()         : call netlink_dump
      netlink_dump          : take nlk->cb_mutex
      ctrl_dumpfamily()     : take genl_mutex
      
      Register genl_lock as callback mutex with netlink to fix this. This slightly
      widens an already existing module unload race, the genl ops used during the
      dump might go away when the module is unloaded. Thomas Graf is working on a
      seperate fix for this.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d1a3fb5
    • D
      Revert "mac80211: Use skb_header_cloned() on TX path." · 3a5be7d4
      David S. Miller 提交于
      This reverts commit 608961a5.
      
      The problem is that the mac80211 stack not only needs to be able to
      muck with the link-level headers, it also might need to mangle all of
      the packet data if doing sw wireless encryption.
      
      This fixes kernel bugzilla #10903.  Thanks to Didier Raboud (for the
      bugzilla report), Andrew Prince (for bisecting), Johannes Berg (for
      bringing this bisection analysis to my attention), and Ilpo (for
      trying to analyze this purely from the TCP side).
      
      In 2.6.27 we can take another stab at this, by using something like
      skb_cow_data() when the TX path of mac80211 ends up with a non-NULL
      tx->key.  The ESP protocol code in the IPSEC stack can be used as a
      model for implementation.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a5be7d4
    • R
      af_unix: fix 'poll for write'/ connected DGRAM sockets · 3c73419c
      Rainer Weikusat 提交于
      The unix_dgram_sendmsg routine implements a (somewhat crude)
      form of receiver-imposed flow control by comparing the length of the
      receive queue of the 'peer socket' with the max_ack_backlog value
      stored in the corresponding sock structure, either blocking
      the thread which caused the send-routine to be called or returning
      EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
      sockets. The poll-implementation for these socket types is
      datagram_poll from core/datagram.c. A socket is deemed to be writeable
      by this routine when the memory presently consumed by datagrams
      owned by it is less than the configured socket send buffer size. This
      is always wrong for connected PF_UNIX non-stream sockets when the
      abovementioned receive queue is currently considered to be full.
      'poll' will then return, indicating that the socket is writeable, but
      a subsequent write result in EAGAIN, effectively causing an
      (usual) application to 'poll for writeability by repeated send request
      with O_NONBLOCK set' until it has consumed its time quantum.
      
      The change below uses a suitably modified variant of the datagram_poll
      routines for both type of PF_UNIX sockets, which tests if the
      recv-queue of the peer a socket is connected to is presently
      considered to be 'full' as part of the 'is this socket
      writeable'-checking code. The socket being polled is additionally
      put onto the peer_wait wait queue associated with its peer, because the
      unix_dgram_sendmsg routine does a wake up on this queue after a
      datagram was received and the 'other wakeup call' is done implicitly
      as part of skb destruction, meaning, a process blocked in poll
      because of a full peer receive queue could otherwise sleep forever
      if no datagram owned by its socket was already sitting on this queue.
      Among this change is a small (inline) helper routine named
      'unix_recvq_full', which consolidates the actual testing code (in three
      different places) into a single location.
      Signed-off-by: NRainer Weikusat <rweikusat@mssgmbh.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c73419c
    • S
      xfrm: fix fragmentation for ipv4 xfrm tunnel · fe833fca
      Steffen Klassert 提交于
      When generating the ip header for the transformed packet we just copy
      the frag_off field of the ip header from the original packet to the ip
      header of the new generated packet. If we receive a packet as a chain
      of fragments, all but the last of the new generated packets have the
      IP_MF flag set. We have to mask the frag_off field to only keep the
      IP_DF flag from the original packet. This got lost with git commit
      36cf9acf ("[IPSEC]: Separate
      inner/outer mode processing on output")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe833fca
    • P
      netfilter: nf_conntrack_h323: fix module unload crash · a56b8f81
      Patrick McHardy 提交于
      The H.245 helper is not registered/unregistered, but assigned to
      connections manually from the Q.931 helper. This means on unload
      existing expectations and connections using the helper are not
      cleaned up, leading to the following oops on module unload:
      
      CPU 0 Unable to handle kernel paging request at virtual address c00a6828, epc == 802224dc, ra == 801d4e7c
      Oops[#1]:
      Cpu 0
      $ 0   : 00000000 00000000 00000004 c00a67f0
      $ 4   : 802a5ad0 81657e00 00000000 00000000
      $ 8   : 00000008 801461c8 00000000 80570050
      $12   : 819b0280 819b04b0 00000006 00000000
      $16   : 802a5a60 80000000 80b46000 80321010
      $20   : 00000000 00000004 802a5ad0 00000001
      $24   : 00000000 802257a8
      $28   : 802a4000 802a59e8 00000004 801d4e7c
      Hi    : 0000000b
      Lo    : 00506320
      epc   : 802224dc ip_conntrack_help+0x38/0x74     Tainted: P
      ra    : 801d4e7c nf_iterate+0xbc/0x130
      Status: 1000f403    KERNEL EXL IE
      Cause : 00800008
      BadVA : c00a6828
      PrId  : 00019374
      Modules linked in: ip_nat_pptp ip_conntrack_pptp ath_pktlog wlan_acl wlan_wep wlan_tkip wlan_ccmp wlan_xauth ath_pci ath_dev ath_dfs ath_rate_atheros wlan ath_hal ip_nat_tftp ip_conntrack_tftp ip_nat_ftp ip_conntrack_ftp pppoe ppp_async ppp_deflate ppp_mppe pppox ppp_generic slhc
      Process swapper (pid: 0, threadinfo=802a4000, task=802a6000)
      Stack : 801e7d98 00000004 802a5a60 80000000 801d4e7c 801d4e7c 802a5ad0 00000004
              00000000 00000000 801e7d98 00000000 00000004 802a5ad0 00000000 00000010
              801e7d98 80b46000 802a5a60 80320000 80000000 801d4f8c 802a5b00 00000002
              80063834 00000000 80b46000 802a5a60 801e7d98 80000000 802ba854 00000000
              81a02180 80b7e260 81a021b0 819b0000 819b0000 80570056 00000000 00000001
              ...
      Call Trace:
       [<801e7d98>] ip_finish_output+0x0/0x23c
       [<801d4e7c>] nf_iterate+0xbc/0x130
       [<801d4e7c>] nf_iterate+0xbc/0x130
       [<801e7d98>] ip_finish_output+0x0/0x23c
       [<801e7d98>] ip_finish_output+0x0/0x23c
       [<801d4f8c>] nf_hook_slow+0x9c/0x1a4
      
      One way to fix this would be to split helper cleanup from the unregistration
      function and invoke it for the H.245 helper, but since ctnetlink needs to be
      able to find the helper for synchonization purposes, a better fix is to
      register it normally and make sure its not assigned to connections during
      helper lookup. The missing l3num initialization is enough for this, this
      patch changes it to use AF_UNSPEC to make it more explicit though.
      Reported-by: Nliannan <liannan@twsz.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a56b8f81
    • P
      netfilter: nf_conntrack_h323: fix memory leak in module initialization error path · 8a548868
      Patrick McHardy 提交于
      Properly free h323_buffer when helper registration fails.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a548868
    • P
      netfilter: nf_nat: fix RCU races · 68b80f11
      Patrick McHardy 提交于
      Fix three ct_extend/NAT extension related races:
      
      - When cleaning up the extension area and removing it from the bysource hash,
        the nat->ct pointer must not be set to NULL since it may still be used in
        a RCU read side
      
      - When replacing a NAT extension area in the bysource hash, the nat->ct
        pointer must be assigned before performing the replacement
      
      - When reallocating extension storage in ct_extend, the old memory must
        not be freed immediately since it may still be used by a RCU read side
      
      Possibly fixes https://bugzilla.redhat.com/show_bug.cgi?id=449315
      and/or http://bugzilla.kernel.org/show_bug.cgi?id=10875Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68b80f11
  4. 17 6月, 2008 11 次提交
  5. 14 6月, 2008 2 次提交
  6. 13 6月, 2008 2 次提交
    • D
      tcp: Revert 'process defer accept as established' changes. · ec0a1966
      David S. Miller 提交于
      This reverts two changesets, ec3c0982
      ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
      the follow-on bug fix 9ae27e0a
      ("tcp: Fix slab corruption with ipv6 and tcp6fuzz").
      
      This change causes several problems, first reported by Ingo Molnar
      as a distcc-over-loopback regression where connections were getting
      stuck.
      
      Ilpo Järvinen first spotted the locking problems.  The new function
      added by this code, tcp_defer_accept_check(), only has the
      child socket locked, yet it is modifying state of the parent
      listening socket.
      
      Fixing that is non-trivial at best, because we can't simply just grab
      the parent listening socket lock at this point, because it would
      create an ABBA deadlock.  The normal ordering is parent listening
      socket --> child socket, but this code path would require the
      reverse lock ordering.
      
      Next is a problem noticed by Vitaliy Gusev, he noted:
      
      ----------------------------------------
      >--- a/net/ipv4/tcp_timer.c
      >+++ b/net/ipv4/tcp_timer.c
      >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
      > 		goto death;
      > 	}
      >
      >+	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
      >+		tcp_send_active_reset(sk, GFP_ATOMIC);
      >+		goto death;
      
      Here socket sk is not attached to listening socket's request queue. tcp_done()
      will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
      release this sk) as socket is not DEAD. Therefore socket sk will be lost for
      freeing.
      ----------------------------------------
      
      Finally, Alexey Kuznetsov argues that there might not even be any
      real value or advantage to these new semantics even if we fix all
      of the bugs:
      
      ----------------------------------------
      Hiding from accept() sockets with only out-of-order data only
      is the only thing which is impossible with old approach. Is this really
      so valuable? My opinion: no, this is nothing but a new loophole
      to consume memory without control.
      ----------------------------------------
      
      So revert this thing for now.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0a1966
    • D
      ipv6: Fix duplicate initialization of rawv6_prot.destroy · f23d60de
      David S. Miller 提交于
      In changeset 22dd4850
      ("raw: Raw socket leak.") code was added so that we
      flush pending frames on raw sockets to avoid leaks.
      
      The ipv4 part was fine, but the ipv6 part was not
      done correctly.  Unlike the ipv4 side, the ipv6 code
      already has a .destroy method for rawv6_prot.
      
      So now there were two assignments to this member, and
      what the compiler does is use the last one, effectively
      making the ipv6 parts of that changeset a NOP.
      
      Fix this by removing the:
      
      	.destroy	   = inet6_destroy_sock,
      
      line, and adding an inet6_destroy_sock() call to the
      end of raw6_destroy().
      
      Noticed by Al Viro.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      f23d60de
  7. 12 6月, 2008 7 次提交
  8. 11 6月, 2008 8 次提交
    • G
      dccp: Bug in initial acknowledgment number assignment · be4c798a
      Gerrit Renker 提交于
      Step 8.5 in RFC 4340 says for the newly cloned socket
      
                 Initialize S.GAR := S.ISS,
      
      but what in fact the code (minisocks.c) does is
      
                 Initialize S.GAR := S.ISR,
      
      which is wrong (typo?) -- fixed by the patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      be4c798a
    • G
      dccp ccid-3: X truncated due to type conversion · 7deb0f85
      Gerrit Renker 提交于
      This fixes a bug in computing the inter-packet-interval t_ipi = s/X: 
      
       scaled_div32(a, b) uses u32 for b, but in "scaled_div32(s, X)" the type of the
       sending rate `X' is u64. Since X is scaled by 2^6, this truncates rates greater
       than 2^26 Bps (~537 Mbps).
      
      Using full 64-bit division now.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      7deb0f85
    • G
      dccp ccid-3: TFRC reverse-lookup Bug-Fix · 1e8a287c
      Gerrit Renker 提交于
      This fixes a bug in the reverse lookup of p: given a value f(p), instead of p,
      the function returned the smallest tabulated value f(p).
      
      The smallest tabulated value of
      	 
         10^6 * f(p) =  sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p) 
      
      for p=0.0001 is 8172. 
      
      Since this value is scaled by 10^6, the outcome of this bug is that a loss
      of 8172/10^6 = 0.8172% was reported whenever the input was below the table
      resolution of 0.01%.
      
      This means that the value was over 80 times too high, resulting in large spikes
      of the initial loss interval, thus unnecessarily reducing the throughput.
      
      Also corrected the printk format (%u for u32).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      1e8a287c
    • G
      dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets · 65907a43
      Gerrit Renker 提交于
      This fixes an oversight from an earlier patch, ensuring that Ack Vectors
      are not processed on request sockets.
      
      The issue is that Ack Vectors must not be parsed on request sockets, since
      the Ack Vector feature depends on the selection of the (TX) CCID. During the
      initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:
      
       "Using CCID-specific options and feature options during a negotiation
        for the corresponding CCID feature is NOT RECOMMENDED [...]"
      
      And it is not even possible: when the server receives the Request from the 
      client, the CCID and Ack vector features are undefined; when the Ack finalising
      the 3-way hanshake arrives, the request socket has not been cloned yet into a
      full socket. (This order is necessary, since otherwise the newly created socket
      would have to be destroyed whenever an option error occurred - a malicious
      hacker could simply send garbage options and exploit this.)
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      65907a43
    • G
      dccp: Fix sparse warnings · 1e2f0e5e
      Gerrit Renker 提交于
      This patch fixes the following sparse warnings:
       * nested min(max()) expression:
         net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
         net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one
         
       * Declaration of function prototypes in .c instead of .h file, resulting in
         "should it be static?" warnings. 
      
       * Declared "struct dccpw" static (local to dccp_probe).
       
       * Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
         ("Receivers SHOULD implement delayed acknowledgement timers ...").
      
       * Used a different local variable name to avoid
         net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
         net/dccp/ackvec.c:238:33: originally declared here
      
       * Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      1e2f0e5e
    • G
      dccp ccid-3: Bug-Fix - Zero RTT is possible · 3294f202
      Gerrit Renker 提交于
      In commit $(825de27d) (from 27th May, commit
      message `dccp ccid-3: Fix "t_ipi explosion" bug'), the CCID-3 window counter
      computation was fixed to cope with RTTs < 4 microseconds.
      
      Such RTTs can be found e.g. when running CCID-3 over loopback. The fix removed
      a check against RTT < 4, but introduced a divide-by-zero bug.
      
      All steady-state RTTs in DCCP are filtered using dccp_sample_rtt(), which
      ensures non-zero samples. However, a zero RTT is possible on initialisation,
      when there is no RTT sample from the Request/Response exchange.
      
      The fix is to use the fallback-RTT from RFC 4340, 3.4.
      
      This is also better than just fixing update_win_count() since it allows other
      parts of the code to always assume that the RTT is non-zero during the time
      that the CCID is used.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      3294f202
    • K
      net: Fix routing tables with id > 255 for legacy software · 709772e6
      Krzysztof Piotr Oledzki 提交于
      Most legacy software do not like tables > 255 as rtm_table is u8
      so tb_id is sent &0xff and it is possible to mismatch for example
      table 510 with table 254 (main).
      
      This patch introduces RT_TABLE_COMPAT=252 so the code uses it if
      tb_id > 255. It makes such old applications happy, new
      ones are still able to use RTA_TABLE to get a proper table id.
      Signed-off-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      709772e6
    • J
      ipsec: pfkey should ignore events when no listeners · 99c6f60e
      Jamal Hadi Salim 提交于
      When pfkey has no km listeners, it still does a lot of work
      before finding out there aint nobody out there.
      If a tree falls in a forest and no one is around to hear it, does it make
      a sound? In this case it makes a lot of noise:
      With this short-circuit adding 10s of thousands of SAs using
      netlink improves performance by ~10%.
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99c6f60e