1. 21 6月, 2005 3 次提交
    • R
      [NETLINK]: fib_lookup() via netlink · 246955fe
      Robert Olsson 提交于
      Below is a more generic patch to do fib_lookup via netlink. For others 
      we should say that we discussed this as a way to verify route selection.
      It's also possible there are others uses for this.
      
      In short the fist half of struct fib_result_nl is filled in by caller 
      and netlink call fills in the other half and returns it.
      
      In case anyone is interested there is a corresponding user app to compare 
      the full routing table this was used to test implementation of the LC-trie. 
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      246955fe
    • H
      [IPSEC]: Add XFRM_STATE_NOPMTUDISC flag · dd87147e
      Herbert Xu 提交于
      This patch adds the flag XFRM_STATE_NOPMTUDISC for xfrm states.  It is
      similar to the nopmtudisc on IPIP/GRE tunnels.  It only has an effect
      on IPv4 tunnel mode states.  For these states, it will ensure that the
      DF flag is always cleared.
      
      This is primarily useful to work around ICMP blackholes.
      
      In future this flag could also allow a larger MTU to be set within the
      tunnel just like IPIP/GRE tunnels.  This could be useful for short haul
      tunnels where temporary fragmentation outside the tunnel is desired over
      smaller fragments inside the tunnel.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NJames Morris <jmorris@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd87147e
    • H
      [IPSEC]: Add xfrm_init_state · 72cb6962
      Herbert Xu 提交于
      This patch adds xfrm_init_state which is simply a wrapper that calls
      xfrm_get_type and subsequently x->type->init_state.  It also gets rid
      of the unused args argument.
      
      Abstracting it out allows us to add common initialisation code, e.g.,
      to set family-specific flags.
      
      The add_time setting in xfrm_user.c was deleted because it's already
      set by xfrm_state_alloc.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NJames Morris <jmorris@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72cb6962
  2. 19 6月, 2005 13 次提交
  3. 16 6月, 2005 1 次提交
  4. 14 6月, 2005 5 次提交
  5. 03 6月, 2005 1 次提交
  6. 01 6月, 2005 1 次提交
  7. 31 5月, 2005 2 次提交
    • H
      [IPV4]: Fix BUG() in 2.6.x, udp_poll(), fragments + CONFIG_HIGHMEM · 208d8984
      Herbert Xu 提交于
      Steven Hand <Steven.Hand@cl.cam.ac.uk> wrote:
      > 
      > Reconstructed forward trace: 
      > 
      >   net/ipv4/udp.c:1334   spin_lock_irq() 
      >   net/ipv4/udp.c:1336   udp_checksum_complete() 
      > net/core/skbuff.c:1069   skb_shinfo(skb)->nr_frags > 1
      > net/core/skbuff.c:1086   kunmap_skb_frag()
      > net/core/skbuff.h:1087   local_bh_enable()
      > kernel/softirq.c:0140   WARN_ON(irqs_disabled());
      
      The receive queue lock is never taken in IRQs (and should never be) so
      we can simply substitute bh for irq.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      208d8984
    • H
      [NETFILTER]: Fix deadlock with ip_queue and tcp local input path. · 9bb7bc94
      Harald Welte 提交于
      When we have ip_queue being used from LOCAL_IN, then we end up with a
      situation where the verdicts coming back from userspace traverse the TCP
      input path from syscall context.  While this seems to work most of the
      time, there's an ugly deadlock:
      
      syscall context is interrupted by the timer interrupt.  When the timer
      interrupt leaves, the timer softirq get's scheduled and calls
      tcp_delack_timer() and alike.  They themselves do bh_lock_sock(sk),
      which is already held from somewhere else -> boom.
      
      I've now tested the suggested solution by Patrick McHardy and Herbert Xu to
      simply use local_bh_{en,dis}able().
      Signed-off-by: NHarald Welte <laforge@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bb7bc94
  8. 30 5月, 2005 2 次提交
  9. 24 5月, 2005 1 次提交
    • D
      [TCP]: Fix stretch ACK performance killer when doing ucopy. · 31432412
      David S. Miller 提交于
      When we are doing ucopy, we try to defer the ACK generation to
      cleanup_rbuf().  This works most of the time very well, but if the
      ucopy prequeue is large, this ACKing behavior kills performance.
      
      With TSO, it is possible to fill the prequeue so large that by the
      time the ACK is sent and gets back to the sender, most of the window
      has emptied of data and performance suffers significantly.
      
      This behavior does help in some cases, so we should think about
      re-enabling this trick in the future, using some kind of limit in
      order to avoid the bug case.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31432412
  10. 20 5月, 2005 2 次提交
  11. 19 5月, 2005 1 次提交
    • H
      [IPV4/IPV6] Ensure all frag_list members have NULL sk · 2fdba6b0
      Herbert Xu 提交于
      Having frag_list members which holds wmem of an sk leads to nightmares
      with partially cloned frag skb's.  The reason is that once you unleash
      a skb with a frag_list that has individual sk ownerships into the stack
      you can never undo those ownerships safely as they may have been cloned
      by things like netfilter.  Since we have to undo them in order to make
      skb_linearize happy this approach leads to a dead-end.
      
      So let's go the other way and make this an invariant:
      
      	For any skb on a frag_list, skb->sk must be NULL.
      
      That is, the socket ownership always belongs to the head skb.
      It turns out that the implementation is actually pretty simple.
      
      The above invariant is actually violated in the following patch
      for a short duration inside ip_fragment.  This is OK because the
      offending frag_list member is either destroyed at the end of the
      slow path without being sent anywhere, or it is detached from
      the frag_list before being sent.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fdba6b0
  12. 06 5月, 2005 2 次提交
  13. 04 5月, 2005 6 次提交
    • H
      [IPSEC]: Store idev entries · aabc9761
      Herbert Xu 提交于
      I found a bug that stopped IPsec/IPv6 from working.  About
      a month ago IPv6 started using rt6i_idev->dev on the cached socket dst
      entries.  If the cached socket dst entry is IPsec, then rt6i_idev will
      be NULL.
      
      Since we want to look at the rt6i_idev of the original route in this
      case, the easiest fix is to store rt6i_idev in the IPsec dst entry just
      as we do for a number of other IPv6 route attributes.  Unfortunately
      this means that we need some new code to handle the references to
      rt6i_idev.  That's why this patch is bigger than it would otherwise be.
      
      I've also done the same thing for IPv4 since it is conceivable that
      once these idev attributes start getting used for accounting, we
      probably need to dereference them for IPv4 IPsec entries too.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aabc9761
    • P
    • H
      [NETLINK]: Synchronous message processing. · 2a0a6ebe
      Herbert Xu 提交于
      Let's recap the problem.  The current asynchronous netlink kernel
      message processing is vulnerable to these attacks:
      
      1) Hit and run: Attacker sends one or more messages and then exits
      before they're processed.  This may confuse/disable the next netlink
      user that gets the netlink address of the attacker since it may
      receive the responses to the attacker's messages.
      
      Proposed solutions:
      
      a) Synchronous processing.
      b) Stream mode socket.
      c) Restrict/prohibit binding.
      
      2) Starvation: Because various netlink rcv functions were written
      to not return until all messages have been processed on a socket,
      it is possible for these functions to execute for an arbitrarily
      long period of time.  If this is successfully exploited it could
      also be used to hold rtnl forever.
      
      Proposed solutions:
      
      a) Synchronous processing.
      b) Stream mode socket.
      
      Firstly let's cross off solution c).  It only solves the first
      problem and it has user-visible impacts.  In particular, it'll
      break user space applications that expect to bind or communicate
      with specific netlink addresses (pid's).
      
      So we're left with a choice of synchronous processing versus
      SOCK_STREAM for netlink.
      
      For the moment I'm sticking with the synchronous approach as
      suggested by Alexey since it's simpler and I'd rather spend
      my time working on other things.
      
      However, it does have a number of deficiencies compared to the
      stream mode solution:
      
      1) User-space to user-space netlink communication is still vulnerable.
      
      2) Inefficient use of resources.  This is especially true for rtnetlink
      since the lock is shared with other users such as networking drivers.
      The latter could hold the rtnl while communicating with hardware which
      causes the rtnetlink user to wait when it could be doing other things.
      
      3) It is still possible to DoS all netlink users by flooding the kernel
      netlink receive queue.  The attacker simply fills the receive socket
      with a single netlink message that fills up the entire queue.  The
      attacker then continues to call sendmsg with the same message in a loop.
      
      Point 3) can be countered by retransmissions in user-space code, however
      it is pretty messy.
      
      In light of these problems (in particular, point 3), we should implement
      stream mode netlink at some point.  In the mean time, here is a patch
      that implements synchronous processing.  
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a0a6ebe
    • F
    • T
      [RTNETLINK] Cleanup rtnetlink_link tables · db46edc6
      Thomas Graf 提交于
      Converts remaining rtnetlink_link tables to use c99 designated
      initializers to make greping a little bit easier.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db46edc6
    • P