1. 10 4月, 2006 4 次提交
    • H
      [INET]: Move no-tunnel ICMP error to tunnel4/tunnel6 · 50fba2aa
      Herbert Xu 提交于
      This patch moves the sending of ICMP messages when there are no IPv4/IPv6
      tunnels present to tunnel4/tunnel6 respectively.  Please note that for now
      if xfrm4_tunnel/xfrm6_tunnel is loaded then no ICMP messages will ever be
      sent.  This is similar to how we handle AH/ESP/IPCOMP.
      
      This move fixes the bug where we always send an ICMP message when there is
      no ip6_tunnel device present for a given packet even if it is later handled
      by IPsec.  It also causes ICMP messages to be sent when no IPIP tunnel is
      present.
      
      I've decided to use the "port unreachable" ICMP message over the current
      value of "address unreachable" (and "protocol unreachable" by GRE) because
      it is not ambiguous unlike the other ones which can be triggered by other
      conditions.  There seems to be no standard specifying what value must be
      used so this change should be OK.  In fact we should change GRE to use
      this value as well.
      
      Incidentally, this patch also fixes a fairly serious bug in xfrm6_tunnel
      where we don't check whether the embedded IPv6 header is present before
      dereferencing it for the inside source address.
      
      This patch is inspired by a previous patch by Hugo Santos <hsantos@av.it.pt>.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50fba2aa
    • P
      [NETFILTER]: Fix fragmentation issues with bridge netfilter · 2e2f7aef
      Patrick McHardy 提交于
      The conntrack code doesn't do re-fragmentation of defragmented packets
      anymore but relies on fragmentation in the IP layer. Purely bridged
      packets don't pass through the IP layer, so the bridge netfilter code
      needs to take care of fragmentation itself.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e2f7aef
    • R
      [FIB_TRIE]: Fix leaf freeing. · 550e29bc
      Robert Olsson 提交于
      Seems like leaf (end-nodes) has been freed by __tnode_free_rcu and not
      by __leaf_free_rcu. This fixes the problem. Only tnode_free is now
      used which checks for appropriate node type. free_leaf can be removed.
      Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      550e29bc
    • H
      [IPSEC]: Check x->encap before dereferencing it · 8bf4b8a1
      Herbert Xu 提交于
      We need to dereference x->encap before dereferencing it for encap_type.
      If it's absent then the encap_type is zero.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bf4b8a1
  2. 01 4月, 2006 6 次提交
  3. 29 3月, 2006 3 次提交
    • A
      [NETFILTER]: Rename init functions. · 65b4b4e8
      Andrew Morton 提交于
      Every netfilter module uses `init' for its module_init() function and
      `fini' or `cleanup' for its module_exit() function.
      
      Problem is, this creates uninformative initcall_debug output and makes
      ctags rather useless.
      
      So go through and rename them all to $(filename)_init and
      $(filename)_fini.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65b4b4e8
    • S
      [TCP]: Fix RFC2465 typo. · c3e5d877
      S P 提交于
      Signed-off-by: NS P <speattle@yahoo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3e5d877
    • H
      [INET]: Introduce tunnel4/tunnel6 · d2acc347
      Herbert Xu 提交于
      Basically this patch moves the generic tunnel protocol stuff out of
      xfrm4_tunnel/xfrm6_tunnel and moves it into the new files of tunnel4.c
      and tunnel6 respectively.
      
      The reason for this is that the problem that Hugo uncovered is only
      the tip of the iceberg.  The real problem is that when we removed the
      dependency of ipip on xfrm4_tunnel we didn't really consider the module
      case at all.
      
      For instance, as it is it's possible to build both ipip and xfrm4_tunnel
      as modules and if the latter is loaded then ipip simply won't load.
      
      After considering the alternatives I've decided that the best way out of
      this is to restore the dependency of ipip on the non-xfrm-specific part
      of xfrm4_tunnel.  This is acceptable IMHO because the intention of the
      removal was really to be able to use ipip without the xfrm subsystem.
      This is still preserved by this patch.
      
      So now both ipip/xfrm4_tunnel depend on the new tunnel4.c which handles
      the arbitration between the two.  The order of processing is determined
      by a simple integer which ensures that ipip gets processed before
      xfrm4_tunnel.
      
      The situation for ICMP handling is a little bit more complicated since
      we may not have enough information to determine who it's for.  It's not
      a big deal at the moment since the xfrm ICMP handlers are basically
      no-ops.  In future we can deal with this when we look at ICMP caching
      in general.
      
      The user-visible change to this is the removal of the TUNNEL Kconfig
      prompts.  This makes sense because it can only be used through IPCOMP
      as it stands.
      
      The addition of the new modules shouldn't introduce any problems since
      module dependency will cause them to be loaded.
      
      Oh and I also turned some unnecessary pskb's in IPv6 related to this
      patch to skb's.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2acc347
  4. 28 3月, 2006 1 次提交
    • A
      [PATCH] Notifier chain update: API changes · e041c683
      Alan Stern 提交于
      The kernel's implementation of notifier chains is unsafe.  There is no
      protection against entries being added to or removed from a chain while the
      chain is in use.  The issues were discussed in this thread:
      
          http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
      
      We noticed that notifier chains in the kernel fall into two basic usage
      classes:
      
      	"Blocking" chains are always called from a process context
      	and the callout routines are allowed to sleep;
      
      	"Atomic" chains can be called from an atomic context and
      	the callout routines are not allowed to sleep.
      
      We decided to codify this distinction and make it part of the API.  Therefore
      this set of patches introduces three new, parallel APIs: one for blocking
      notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
      really just the old API under a new name).  New kinds of data structures are
      used for the heads of the chains, and new routines are defined for
      registration, unregistration, and calling a chain.  The three APIs are
      explained in include/linux/notifier.h and their implementation is in
      kernel/sys.c.
      
      With atomic and blocking chains, the implementation guarantees that the chain
      links will not be corrupted and that chain callers will not get messed up by
      entries being added or removed.  For raw chains the implementation provides no
      guarantees at all; users of this API must provide their own protections.  (The
      idea was that situations may come up where the assumptions of the atomic and
      blocking APIs are not appropriate, so it should be possible for users to
      handle these things in their own way.)
      
      There are some limitations, which should not be too hard to live with.  For
      atomic/blocking chains, registration and unregistration must always be done in
      a process context since the chain is protected by a mutex/rwsem.  Also, a
      callout routine for a non-raw chain must not try to register or unregister
      entries on its own chain.  (This did happen in a couple of places and the code
      had to be changed to avoid it.)
      
      Since atomic chains may be called from within an NMI handler, they cannot use
      spinlocks for synchronization.  Instead we use RCU.  The overhead falls almost
      entirely in the unregister routine, which is okay since unregistration is much
      less frequent that calling a chain.
      
      Here is the list of chains that we adjusted and their classifications.  None
      of them use the raw API, so for the moment it is only a placeholder.
      
        ATOMIC CHAINS
        -------------
      arch/i386/kernel/traps.c:		i386die_chain
      arch/ia64/kernel/traps.c:		ia64die_chain
      arch/powerpc/kernel/traps.c:		powerpc_die_chain
      arch/sparc64/kernel/traps.c:		sparc64die_chain
      arch/x86_64/kernel/traps.c:		die_chain
      drivers/char/ipmi/ipmi_si_intf.c:	xaction_notifier_list
      kernel/panic.c:				panic_notifier_list
      kernel/profile.c:			task_free_notifier
      net/bluetooth/hci_core.c:		hci_notifier
      net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_chain
      net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_expect_chain
      net/ipv6/addrconf.c:			inet6addr_chain
      net/netfilter/nf_conntrack_core.c:	nf_conntrack_chain
      net/netfilter/nf_conntrack_core.c:	nf_conntrack_expect_chain
      net/netlink/af_netlink.c:		netlink_chain
      
        BLOCKING CHAINS
        ---------------
      arch/powerpc/platforms/pseries/reconfig.c:	pSeries_reconfig_chain
      arch/s390/kernel/process.c:		idle_chain
      arch/x86_64/kernel/process.c		idle_notifier
      drivers/base/memory.c:			memory_chain
      drivers/cpufreq/cpufreq.c		cpufreq_policy_notifier_list
      drivers/cpufreq/cpufreq.c		cpufreq_transition_notifier_list
      drivers/macintosh/adb.c:		adb_client_list
      drivers/macintosh/via-pmu.c		sleep_notifier_list
      drivers/macintosh/via-pmu68k.c		sleep_notifier_list
      drivers/macintosh/windfarm_core.c	wf_client_list
      drivers/usb/core/notify.c		usb_notifier_list
      drivers/video/fbmem.c			fb_notifier_list
      kernel/cpu.c				cpu_chain
      kernel/module.c				module_notify_list
      kernel/profile.c			munmap_notifier
      kernel/profile.c			task_exit_notifier
      kernel/sys.c				reboot_notifier_list
      net/core/dev.c				netdev_chain
      net/decnet/dn_dev.c:			dnaddr_chain
      net/ipv4/devinet.c:			inetaddr_chain
      
      It's possible that some of these classifications are wrong.  If they are,
      please let us know or submit a patch to fix them.  Note that any chain that
      gets called very frequently should be atomic, because the rwsem read-locking
      used for blocking chains is very likely to incur cache misses on SMP systems.
      (However, if the chain's callout routines may sleep then the chain cannot be
      atomic.)
      
      The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
      material written by Keith Owens and suggestions from Paul McKenney and Andrew
      Morton.
      
      [jes@sgi.com: restructure the notifier chain initialization macros]
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: NJes Sorensen <jes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e041c683
  5. 27 3月, 2006 1 次提交
  6. 26 3月, 2006 1 次提交
    • D
      [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications · f348d70a
      Davide Libenzi 提交于
      Implement the half-closed devices notifiation, by adding a new POLLRDHUP
      (and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
      existing POLLHUP handling, that does not report correctly half-closed
      devices, was feared to be changed, this implementation leaves the current
      POLLHUP reporting unchanged and simply add a new bit that is set in the few
      places where it makes sense.  The same thing was discussed and conceptually
      agreed quite some time ago:
      
      http://lkml.org/lkml/2003/7/12/116
      
      Since this new event bit is added to the existing Linux poll infrastruture,
      even the existing poll/select system calls will be able to use it.  As far
      as the existing POLLHUP handling, the patch leaves it as is.  The
      pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
      archs and sets the bit in the six relevant files.  The other attached diff
      is the simple change required to sys/epoll.h to add the EPOLLRDHUP
      definition.
      
      There is "a stupid program" to test POLLRDHUP delivery here:
      
       http://www.xmailserver.org/pollrdhup-test.c
      
      It tests poll(2), but since the delivery is same epoll(2) will work equally.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f348d70a
  7. 25 3月, 2006 3 次提交
  8. 24 3月, 2006 1 次提交
  9. 23 3月, 2006 7 次提交
  10. 21 3月, 2006 13 次提交
    • J
      5e35941d
    • D
      [INET]: Fix typo in Arnaldo's connection sock compat fixups. · dbeff12b
      David S. Miller 提交于
      "struct inet_csk" --> "struct inet_connection_sock" :-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbeff12b
    • A
      [NET]: Identation & other cleanups related to compat_[gs]etsockopt cset · 543d9cfe
      Arnaldo Carvalho de Melo 提交于
      No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs
      to just after the function exported, etc.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      543d9cfe
    • A
    • D
      [NET]: {get|set}sockopt compatibility layer · 3fdadf7d
      Dmitry Mishin 提交于
      This patch extends {get|set}sockopt compatibility layer in order to
      move protocol specific parts to their place and avoid huge universal
      net/compat.c file in the future.
      Signed-off-by: NDmitry Mishin <dim@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fdadf7d
    • C
      [SECURITY]: TCP/UDP getpeersec · 2c7946a7
      Catherine Zhang 提交于
      This patch implements an application of the LSM-IPSec networking
      controls whereby an application can determine the label of the
      security association its TCP or UDP sockets are currently connected to
      via getsockopt and the auxiliary data mechanism of recvmsg.
      
      Patch purpose:
      
      This patch enables a security-aware application to retrieve the
      security context of an IPSec security association a particular TCP or
      UDP socket is using.  The application can then use this security
      context to determine the security context for processing on behalf of
      the peer at the other end of this connection.  In the case of UDP, the
      security context is for each individual packet.  An example
      application is the inetd daemon, which could be modified to start
      daemons running at security contexts dependent on the remote client.
      
      Patch design approach:
      
      - Design for TCP
      The patch enables the SELinux LSM to set the peer security context for
      a socket based on the security context of the IPSec security
      association.  The application may retrieve this context using
      getsockopt.  When called, the kernel determines if the socket is a
      connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
      cache on the socket to retrieve the security associations.  If a
      security association has a security context, the context string is
      returned, as for UNIX domain sockets.
      
      - Design for UDP
      Unlike TCP, UDP is connectionless.  This requires a somewhat different
      API to retrieve the peer security context.  With TCP, the peer
      security context stays the same throughout the connection, thus it can
      be retrieved at any time between when the connection is established
      and when it is torn down.  With UDP, each read/write can have
      different peer and thus the security context might change every time.
      As a result the security context retrieval must be done TOGETHER with
      the packet retrieval.
      
      The solution is to build upon the existing Unix domain socket API for
      retrieving user credentials.  Linux offers the API for obtaining user
      credentials via ancillary messages (i.e., out of band/control messages
      that are bundled together with a normal message).
      
      Patch implementation details:
      
      - Implementation for TCP
      The security context can be retrieved by applications using getsockopt
      with the existing SO_PEERSEC flag.  As an example (ignoring error
      checking):
      
      getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen);
      printf("Socket peer context is: %s\n", optbuf);
      
      The SELinux function, selinux_socket_getpeersec, is extended to check
      for labeled security associations for connected (TCP_ESTABLISHED ==
      sk->sk_state) TCP sockets only.  If so, the socket has a dst_cache of
      struct dst_entry values that may refer to security associations.  If
      these have security associations with security contexts, the security
      context is returned.
      
      getsockopt returns a buffer that contains a security context string or
      the buffer is unmodified.
      
      - Implementation for UDP
      To retrieve the security context, the application first indicates to
      the kernel such desire by setting the IP_PASSSEC option via
      getsockopt.  Then the application retrieves the security context using
      the auxiliary data mechanism.
      
      An example server application for UDP should look like this:
      
      toggle = 1;
      toggle_len = sizeof(toggle);
      
      setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len);
      recvmsg(sockfd, &msg_hdr, 0);
      if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
          cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
          if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
              cmsg_hdr->cmsg_level == SOL_IP &&
              cmsg_hdr->cmsg_type == SCM_SECURITY) {
              memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
          }
      }
      
      ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
      a server socket to receive security context of the peer.  A new
      ancillary message type SCM_SECURITY.
      
      When the packet is received we get the security context from the
      sec_path pointer which is contained in the sk_buff, and copy it to the
      ancillary message space.  An additional LSM hook,
      selinux_socket_getpeersec_udp, is defined to retrieve the security
      context from the SELinux space.  The existing function,
      selinux_socket_getpeersec does not suit our purpose, because the
      security context is copied directly to user space, rather than to
      kernel space.
      
      Testing:
      
      We have tested the patch by setting up TCP and UDP connections between
      applications on two machines using the IPSec policies that result in
      labeled security associations being built.  For TCP, we can then
      extract the peer security context using getsockopt on either end.  For
      UDP, the receiving end can retrieve the security context using the
      auxiliary data mechanism of recvmsg.
      Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c7946a7
    • R
      [TCP]: sysctl to allow TCP window > 32767 sans wscale · 15d99e02
      Rick Jones 提交于
      Back in the dark ages, we had to be conservative and only allow 15-bit
      window fields if the window scale option was not negotiated.  Some
      ancient stacks used a signed 16-bit quantity for the window field of
      the TCP header and would get confused.
      
      Those days are long gone, so we can use the full 16-bits by default
      now.
      
      There is a sysctl added so that we can still interact with such old
      stacks
      Signed-off-by: NRick Jones <rick.jones2@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15d99e02
    • N
    • D
      [NETFILTER]: Fix warnings in ip_nat_snmp_basic.c · edb2c34f
      David S. Miller 提交于
      net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'asn1_header_decode':
      net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'len' may be used uninitialized in this function
      net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'def' may be used uninitialized in this function
      net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'snmp_translate':
      net/ipv4/netfilter/ip_nat_snmp_basic.c:672: warning: 'l' may be used uninitialized in this function
      net/ipv4/netfilter/ip_nat_snmp_basic.c:668: warning: 'type' may be used uninitialized in this function
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edb2c34f
    • I
      [NET]: sem2mutex part 2 · 57b47a53
      Ingo Molnar 提交于
      Semaphore to mutex conversion.
      
      The conversion was generated via scripts, and the result was validated
      automatically via a script as well.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57b47a53
    • A
      [NET] sem2mutex: net/ · 4a3e2f71
      Arjan van de Ven 提交于
      Semaphore to mutex conversion.
      
      The conversion was generated via scripts, and the result was validated
      automatically via a script as well.
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a3e2f71
    • S
      [NET]: dev_put/dev_hold cleanup · 15333061
      Stephen Hemminger 提交于
      Get rid of the old __dev_put macro that is just a hold over from pre 2.6
      kernel.  And turn dev_hold into an inline instead of a macro.
      Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15333061
    • S
      [NET]: Convert RTNL to mutex. · 6756ae4b
      Stephen Hemminger 提交于
      This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and
      gets rid of some of the leftover legacy.
      Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6756ae4b