1. 18 6月, 2006 5 次提交
  2. 27 5月, 2006 1 次提交
  3. 13 5月, 2006 1 次提交
    • S
      [NEIGH]: Fix IP-over-ATM and ARP interaction. · bd89efc5
      Simon Kelley 提交于
      The classical IP over ATM code maintains its own IPv4 <-> <ATM stuff>
      ARP table, using the standard neighbour-table code. The
      neigh_table_init function adds this neighbour table to a linked list
      of all neighbor tables which is used by the functions neigh_delete()
      neigh_add() and neightbl_set(), all called by the netlink code.
      
      Once the ATM neighbour table is added to the list, there are two
      tables with family == AF_INET there, and ARP entries sent via netlink
      go into the first table with matching family. This is indeterminate
      and often wrong.
      
      To see the bug, on a kernel with CLIP enabled, create a standard IPv4
      ARP entry by pinging an unused address on a local subnet. Then attempt
      to complete that entry by doing
      
      ip neigh replace <ip address> lladdr <some mac address> nud reachable
      
      Looking at the ARP tables by using 
      
      ip neigh show
      
      will reveal two ARP entries for the same address. One of these can be
      found in /proc/net/arp, and the other in /proc/net/atm/arp.
      
      This patch adds a new function, neigh_table_init_no_netlink() which
      does everything the neigh_table_init() does, except add the table to
      the netlink all-arp-tables chain. In addition neigh_table_init() has a
      check that all tables on the chain have a distinct address family.
      The init call in clip.c is changed to call
      neigh_table_init_no_netlink().
      
      Since ATM ARP tables are rather more complicated than can currently be
      handled by the available rtattrs in the netlink protocol, no
      functionality is lost by this patch, and non-ATM ARP manipulation via
      netlink is rescued. A more complete solution would involve a rtattr
      for ATM ARP entries and some way for the netlink code to give
      neigh_add and friends more information than just address family with
      which to find the correct ARP table.
      
      [ I've changed the assertion checking in neigh_table_init() to not
        use BUG_ON() while holding neigh_tbl_lock.  Instead we remember that
        we found an existing tbl with the same family, and after dropping
        the lock we'll give a diagnostic kernel log message and a stack dump.
        -DaveM ]
      Signed-off-by: NSimon Kelley <simon@thekelleys.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd89efc5
  4. 11 5月, 2006 1 次提交
  5. 10 5月, 2006 2 次提交
  6. 07 5月, 2006 1 次提交
  7. 20 4月, 2006 3 次提交
  8. 19 4月, 2006 1 次提交
  9. 11 4月, 2006 1 次提交
  10. 10 4月, 2006 4 次提交
  11. 31 3月, 2006 1 次提交
  12. 30 3月, 2006 1 次提交
    • D
      [NET]: Deinline some larger functions from netdevice.h · 56079431
      Denis Vlasenko 提交于
      On a allyesconfig'ured kernel:
      
      Size  Uses Wasted Name and definition
      ===== ==== ====== ================================================
         95  162  12075 netif_wake_queue      include/linux/netdevice.h
        129   86   9265 dev_kfree_skb_any     include/linux/netdevice.h
        127   56   5885 netif_device_attach   include/linux/netdevice.h
         73   86   4505 dev_kfree_skb_irq     include/linux/netdevice.h
         46   60   1534 netif_device_detach   include/linux/netdevice.h
        119   16   1485 __netif_rx_schedule   include/linux/netdevice.h
        143    5    492 netif_rx_schedule     include/linux/netdevice.h
         81    7    366 netif_schedule        include/linux/netdevice.h
      
      netif_wake_queue is big because __netif_schedule is a big inline:
      
      static inline void __netif_schedule(struct net_device *dev)
      {
              if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
                      unsigned long flags;
                      struct softnet_data *sd;
      
                      local_irq_save(flags);
                      sd = &__get_cpu_var(softnet_data);
                      dev->next_sched = sd->output_queue;
                      sd->output_queue = dev;
                      raise_softirq_irqoff(NET_TX_SOFTIRQ);
                      local_irq_restore(flags);
              }
      }
      
      static inline void netif_wake_queue(struct net_device *dev)
      {
      #ifdef CONFIG_NETPOLL_TRAP
              if (netpoll_trap())
                      return;
      #endif
              if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
                      __netif_schedule(dev);
      }
      
      By de-inlining __netif_schedule we are saving a lot of text
      at each callsite of netif_wake_queue and netif_schedule.
      __netif_rx_schedule is also big, and it makes more sense to keep
      both of them out of line.
      
      Patch also deinlines dev_kfree_skb_any. We can deinline dev_kfree_skb_irq
      instead... oh well.
      
      netif_device_attach/detach are not hot paths, we can deinline them too.
      Signed-off-by: NDenis Vlasenko <vda@ilport.com.ua>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56079431
  13. 29 3月, 2006 1 次提交
    • D
      [NET]: deinline 200+ byte inlines in sock.h · f0088a50
      Denis Vlasenko 提交于
      Sizes in bytes (allyesconfig, i386) and files where those inlines
      are used:
      
      238 sock_queue_rcv_skb 2.6.16/net/x25/x25_in.o
      238 sock_queue_rcv_skb 2.6.16/net/rose/rose_in.o
      238 sock_queue_rcv_skb 2.6.16/net/packet/af_packet.o
      238 sock_queue_rcv_skb 2.6.16/net/netrom/nr_in.o
      238 sock_queue_rcv_skb 2.6.16/net/llc/llc_sap.o
      238 sock_queue_rcv_skb 2.6.16/net/llc/llc_conn.o
      238 sock_queue_rcv_skb 2.6.16/net/irda/af_irda.o
      238 sock_queue_rcv_skb 2.6.16/net/ipx/af_ipx.o
      238 sock_queue_rcv_skb 2.6.16/net/ipv6/udp.o
      238 sock_queue_rcv_skb 2.6.16/net/ipv6/raw.o
      238 sock_queue_rcv_skb 2.6.16/net/ipv4/udp.o
      238 sock_queue_rcv_skb 2.6.16/net/ipv4/raw.o
      238 sock_queue_rcv_skb 2.6.16/net/ipv4/ipmr.o
      238 sock_queue_rcv_skb 2.6.16/net/econet/econet.o
      238 sock_queue_rcv_skb 2.6.16/net/econet/af_econet.o
      238 sock_queue_rcv_skb 2.6.16/net/bluetooth/sco.o
      238 sock_queue_rcv_skb 2.6.16/net/bluetooth/l2cap.o
      238 sock_queue_rcv_skb 2.6.16/net/bluetooth/hci_sock.o
      238 sock_queue_rcv_skb 2.6.16/net/ax25/ax25_in.o
      238 sock_queue_rcv_skb 2.6.16/net/ax25/af_ax25.o
      238 sock_queue_rcv_skb 2.6.16/net/appletalk/ddp.o
      238 sock_queue_rcv_skb 2.6.16/drivers/net/pppoe.o
      
      276 sk_receive_skb 2.6.16/net/decnet/dn_nsp_in.o
      276 sk_receive_skb 2.6.16/net/dccp/ipv6.o
      276 sk_receive_skb 2.6.16/net/dccp/ipv4.o
      276 sk_receive_skb 2.6.16/net/dccp/dccp_ipv6.o
      276 sk_receive_skb 2.6.16/drivers/net/pppoe.o
      
      209 sk_dst_check 2.6.16/net/ipv6/ip6_output.o
      209 sk_dst_check 2.6.16/net/ipv4/udp.o
      209 sk_dst_check 2.6.16/net/decnet/dn_nsp_out.o
      
      Large inlines with multiple callers:
      Size  Uses Wasted Name and definition
      ===== ==== ====== ================================================
        238   21   4360 sock_queue_rcv_skb    include/net/sock.h
        109   10    801 sock_recv_timestamp   include/net/sock.h
        276    4    768 sk_receive_skb        include/net/sock.h
         94    8    518 __sk_dst_check        include/net/sock.h
        209    3    378 sk_dst_check  include/net/sock.h
        131    4    333 sk_setup_caps include/net/sock.h
        152    2    132 sk_stream_alloc_pskb  include/net/sock.h
        125    2    105 sk_stream_writequeue_purge    include/net/sock.h
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0088a50
  14. 28 3月, 2006 1 次提交
    • A
      [PATCH] Notifier chain update: API changes · e041c683
      Alan Stern 提交于
      The kernel's implementation of notifier chains is unsafe.  There is no
      protection against entries being added to or removed from a chain while the
      chain is in use.  The issues were discussed in this thread:
      
          http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
      
      We noticed that notifier chains in the kernel fall into two basic usage
      classes:
      
      	"Blocking" chains are always called from a process context
      	and the callout routines are allowed to sleep;
      
      	"Atomic" chains can be called from an atomic context and
      	the callout routines are not allowed to sleep.
      
      We decided to codify this distinction and make it part of the API.  Therefore
      this set of patches introduces three new, parallel APIs: one for blocking
      notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
      really just the old API under a new name).  New kinds of data structures are
      used for the heads of the chains, and new routines are defined for
      registration, unregistration, and calling a chain.  The three APIs are
      explained in include/linux/notifier.h and their implementation is in
      kernel/sys.c.
      
      With atomic and blocking chains, the implementation guarantees that the chain
      links will not be corrupted and that chain callers will not get messed up by
      entries being added or removed.  For raw chains the implementation provides no
      guarantees at all; users of this API must provide their own protections.  (The
      idea was that situations may come up where the assumptions of the atomic and
      blocking APIs are not appropriate, so it should be possible for users to
      handle these things in their own way.)
      
      There are some limitations, which should not be too hard to live with.  For
      atomic/blocking chains, registration and unregistration must always be done in
      a process context since the chain is protected by a mutex/rwsem.  Also, a
      callout routine for a non-raw chain must not try to register or unregister
      entries on its own chain.  (This did happen in a couple of places and the code
      had to be changed to avoid it.)
      
      Since atomic chains may be called from within an NMI handler, they cannot use
      spinlocks for synchronization.  Instead we use RCU.  The overhead falls almost
      entirely in the unregister routine, which is okay since unregistration is much
      less frequent that calling a chain.
      
      Here is the list of chains that we adjusted and their classifications.  None
      of them use the raw API, so for the moment it is only a placeholder.
      
        ATOMIC CHAINS
        -------------
      arch/i386/kernel/traps.c:		i386die_chain
      arch/ia64/kernel/traps.c:		ia64die_chain
      arch/powerpc/kernel/traps.c:		powerpc_die_chain
      arch/sparc64/kernel/traps.c:		sparc64die_chain
      arch/x86_64/kernel/traps.c:		die_chain
      drivers/char/ipmi/ipmi_si_intf.c:	xaction_notifier_list
      kernel/panic.c:				panic_notifier_list
      kernel/profile.c:			task_free_notifier
      net/bluetooth/hci_core.c:		hci_notifier
      net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_chain
      net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_expect_chain
      net/ipv6/addrconf.c:			inet6addr_chain
      net/netfilter/nf_conntrack_core.c:	nf_conntrack_chain
      net/netfilter/nf_conntrack_core.c:	nf_conntrack_expect_chain
      net/netlink/af_netlink.c:		netlink_chain
      
        BLOCKING CHAINS
        ---------------
      arch/powerpc/platforms/pseries/reconfig.c:	pSeries_reconfig_chain
      arch/s390/kernel/process.c:		idle_chain
      arch/x86_64/kernel/process.c		idle_notifier
      drivers/base/memory.c:			memory_chain
      drivers/cpufreq/cpufreq.c		cpufreq_policy_notifier_list
      drivers/cpufreq/cpufreq.c		cpufreq_transition_notifier_list
      drivers/macintosh/adb.c:		adb_client_list
      drivers/macintosh/via-pmu.c		sleep_notifier_list
      drivers/macintosh/via-pmu68k.c		sleep_notifier_list
      drivers/macintosh/windfarm_core.c	wf_client_list
      drivers/usb/core/notify.c		usb_notifier_list
      drivers/video/fbmem.c			fb_notifier_list
      kernel/cpu.c				cpu_chain
      kernel/module.c				module_notify_list
      kernel/profile.c			munmap_notifier
      kernel/profile.c			task_exit_notifier
      kernel/sys.c				reboot_notifier_list
      net/core/dev.c				netdev_chain
      net/decnet/dn_dev.c:			dnaddr_chain
      net/ipv4/devinet.c:			inetaddr_chain
      
      It's possible that some of these classifications are wrong.  If they are,
      please let us know or submit a patch to fix them.  Note that any chain that
      gets called very frequently should be atomic, because the rwsem read-locking
      used for blocking chains is very likely to incur cache misses on SMP systems.
      (However, if the chain's callout routines may sleep then the chain cannot be
      atomic.)
      
      The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
      material written by Keith Owens and suggestions from Paul McKenney and Andrew
      Morton.
      
      [jes@sgi.com: restructure the notifier chain initialization macros]
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: NJes Sorensen <jes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e041c683
  15. 27 3月, 2006 1 次提交
  16. 26 3月, 2006 2 次提交
    • D
      [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications · f348d70a
      Davide Libenzi 提交于
      Implement the half-closed devices notifiation, by adding a new POLLRDHUP
      (and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
      existing POLLHUP handling, that does not report correctly half-closed
      devices, was feared to be changed, this implementation leaves the current
      POLLHUP reporting unchanged and simply add a new bit that is set in the few
      places where it makes sense.  The same thing was discussed and conceptually
      agreed quite some time ago:
      
      http://lkml.org/lkml/2003/7/12/116
      
      Since this new event bit is added to the existing Linux poll infrastruture,
      even the existing poll/select system calls will be able to use it.  As far
      as the existing POLLHUP handling, the patch leaves it as is.  The
      pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
      archs and sets the bit in the six relevant files.  The other attached diff
      is the simple change required to sys/epoll.h to add the EPOLLRDHUP
      definition.
      
      There is "a stupid program" to test POLLRDHUP delivery here:
      
       http://www.xmailserver.org/pollrdhup-test.c
      
      It tests poll(2), but since the delivery is same epoll(2) will work equally.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f348d70a
    • A
      [PATCH] slab: implement /proc/slab_allocators · 871751e2
      Al Viro 提交于
      Implement /proc/slab_allocators.   It produces output like:
      
      idr_layer_cache: 80 idr_pre_get+0x33/0x4e
      buffer_head: 2555 alloc_buffer_head+0x20/0x75
      mm_struct: 9 mm_alloc+0x1e/0x42
      mm_struct: 20 dup_mm+0x36/0x370
      vm_area_struct: 384 dup_mm+0x18f/0x370
      vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3
      vm_area_struct: 1 split_vma+0x5a/0x10e
      vm_area_struct: 11 do_brk+0x206/0x2e2
      vm_area_struct: 2 copy_vma+0xda/0x142
      vm_area_struct: 9 setup_arg_pages+0x99/0x214
      fs_cache: 8 copy_fs_struct+0x21/0x133
      fs_cache: 29 copy_process+0xf38/0x10e3
      files_cache: 30 alloc_files+0x1b/0xcf
      signal_cache: 81 copy_process+0xbaa/0x10e3
      sighand_cache: 77 copy_process+0xe65/0x10e3
      sighand_cache: 1 de_thread+0x4d/0x5f8
      anon_vma: 241 anon_vma_prepare+0xd9/0xf3
      size-2048: 1 add_sect_attrs+0x5f/0x145
      size-2048: 2 journal_init_revoke+0x99/0x302
      size-2048: 2 journal_init_revoke+0x137/0x302
      size-2048: 2 journal_init_inode+0xf9/0x1c4
      
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Alexander Nyberg <alexn@telia.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Ravikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      DESC
      slab-leaks3-locking-fix
      EDESC
      From: Andrew Morton <akpm@osdl.org>
      
      Update for slab-remove-cachep-spinlock.patch
      
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Alexander Nyberg <alexn@telia.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Ravikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      871751e2
  17. 25 3月, 2006 2 次提交
  18. 23 3月, 2006 2 次提交
    • J
      [PATCH] WE-20 for kernel 2.6.16 · 711e2c33
      Jean Tourrilhes 提交于
      	This is version 20 of the Wireless Extensions. This is the
      completion of the RtNetlink work I started early 2004, it enables the
      full Wireless Extension API over RtNetlink.
      
      	Few comments on the patch :
      	o totally driver transparent, no change in drivers needed.
      	o iwevent were already RtNetlink based since they were created
      (around 2.5.7). This adds all the regular SET and GET requests over
      RtNetlink, using the exact same mechanism and data format as iwevents.
      	o This is a Kconfig option, as currently most people have no
      need for it. Surprisingly, patch is actually small and well
      encapsulated.
      	o Tested on SMP, attention as been paid to make it 64 bits clean.
      	o Code do probably too many checks and could be further
      optimised, but better safe than sorry.
      	o RtNetlink based version of the Wireless Tools available on
      my web page for people inclined to try out this stuff.
      
      	I would also like to thank Alexey Kuznetsov for his helpful
      suggestions to make this patch better.
      Signed-off-by: NJean Tourrilhes <jt@hpl.hp.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      711e2c33
    • S
      ca6549af
  19. 21 3月, 2006 9 次提交
    • A
      [NET]: Identation & other cleanups related to compat_[gs]etsockopt cset · 543d9cfe
      Arnaldo Carvalho de Melo 提交于
      No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs
      to just after the function exported, etc.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      543d9cfe
    • A
      [SK_BUFF]: export skb_pull_rcsum · f94691ac
      Arnaldo Carvalho de Melo 提交于
      *** Warning: "skb_pull_rcsum" [net/bridge/bridge.ko] undefined!
      *** Warning: "skb_pull_rcsum" [net/8021q/8021q.ko] undefined!
      *** Warning: "skb_pull_rcsum" [drivers/net/pppoe.ko] undefined!
      *** Warning: "skb_pull_rcsum" [drivers/net/ppp_generic.ko] undefined!
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f94691ac
    • D
      [NET]: {get|set}sockopt compatibility layer · 3fdadf7d
      Dmitry Mishin 提交于
      This patch extends {get|set}sockopt compatibility layer in order to
      move protocol specific parts to their place and avoid huge universal
      net/compat.c file in the future.
      Signed-off-by: NDmitry Mishin <dim@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fdadf7d
    • H
      [NET]: Replace skb_pull/skb_postpull_rcsum with skb_pull_rcsum · cbb042f9
      Herbert Xu 提交于
      We're now starting to have quite a number of places that do skb_pull
      followed immediately by an skb_postpull_rcsum.  We can merge these two
      operations into one function with skb_pull_rcsum.  This makes sense
      since most pull operations on receive skb's need to update the
      checksum.
      
      I've decided to make this out-of-line since it is fairly big and the
      fast path where hardware checksums are enabled need to call
      csum_partial anyway.
      
      Since this is a brand new function we get to add an extra check on the
      len argument.  As it is most callers of skb_pull ignore its return
      value which essentially means that there is no check on the len
      argument.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbb042f9
    • C
      [SECURITY]: TCP/UDP getpeersec · 2c7946a7
      Catherine Zhang 提交于
      This patch implements an application of the LSM-IPSec networking
      controls whereby an application can determine the label of the
      security association its TCP or UDP sockets are currently connected to
      via getsockopt and the auxiliary data mechanism of recvmsg.
      
      Patch purpose:
      
      This patch enables a security-aware application to retrieve the
      security context of an IPSec security association a particular TCP or
      UDP socket is using.  The application can then use this security
      context to determine the security context for processing on behalf of
      the peer at the other end of this connection.  In the case of UDP, the
      security context is for each individual packet.  An example
      application is the inetd daemon, which could be modified to start
      daemons running at security contexts dependent on the remote client.
      
      Patch design approach:
      
      - Design for TCP
      The patch enables the SELinux LSM to set the peer security context for
      a socket based on the security context of the IPSec security
      association.  The application may retrieve this context using
      getsockopt.  When called, the kernel determines if the socket is a
      connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
      cache on the socket to retrieve the security associations.  If a
      security association has a security context, the context string is
      returned, as for UNIX domain sockets.
      
      - Design for UDP
      Unlike TCP, UDP is connectionless.  This requires a somewhat different
      API to retrieve the peer security context.  With TCP, the peer
      security context stays the same throughout the connection, thus it can
      be retrieved at any time between when the connection is established
      and when it is torn down.  With UDP, each read/write can have
      different peer and thus the security context might change every time.
      As a result the security context retrieval must be done TOGETHER with
      the packet retrieval.
      
      The solution is to build upon the existing Unix domain socket API for
      retrieving user credentials.  Linux offers the API for obtaining user
      credentials via ancillary messages (i.e., out of band/control messages
      that are bundled together with a normal message).
      
      Patch implementation details:
      
      - Implementation for TCP
      The security context can be retrieved by applications using getsockopt
      with the existing SO_PEERSEC flag.  As an example (ignoring error
      checking):
      
      getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen);
      printf("Socket peer context is: %s\n", optbuf);
      
      The SELinux function, selinux_socket_getpeersec, is extended to check
      for labeled security associations for connected (TCP_ESTABLISHED ==
      sk->sk_state) TCP sockets only.  If so, the socket has a dst_cache of
      struct dst_entry values that may refer to security associations.  If
      these have security associations with security contexts, the security
      context is returned.
      
      getsockopt returns a buffer that contains a security context string or
      the buffer is unmodified.
      
      - Implementation for UDP
      To retrieve the security context, the application first indicates to
      the kernel such desire by setting the IP_PASSSEC option via
      getsockopt.  Then the application retrieves the security context using
      the auxiliary data mechanism.
      
      An example server application for UDP should look like this:
      
      toggle = 1;
      toggle_len = sizeof(toggle);
      
      setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len);
      recvmsg(sockfd, &msg_hdr, 0);
      if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
          cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
          if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
              cmsg_hdr->cmsg_level == SOL_IP &&
              cmsg_hdr->cmsg_type == SCM_SECURITY) {
              memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
          }
      }
      
      ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
      a server socket to receive security context of the peer.  A new
      ancillary message type SCM_SECURITY.
      
      When the packet is received we get the security context from the
      sec_path pointer which is contained in the sk_buff, and copy it to the
      ancillary message space.  An additional LSM hook,
      selinux_socket_getpeersec_udp, is defined to retrieve the security
      context from the SELinux space.  The existing function,
      selinux_socket_getpeersec does not suit our purpose, because the
      security context is copied directly to user space, rather than to
      kernel space.
      
      Testing:
      
      We have tested the patch by setting up TCP and UDP connections between
      applications on two machines using the IPSec policies that result in
      labeled security associations being built.  For TCP, we can then
      extract the peer security context using getsockopt on either end.  For
      UDP, the receiving end can retrieve the security context using the
      auxiliary data mechanism of recvmsg.
      Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c7946a7
    • A
      [NET] sem2mutex: net/ · 4a3e2f71
      Arjan van de Ven 提交于
      Semaphore to mutex conversion.
      
      The conversion was generated via scripts, and the result was validated
      automatically via a script as well.
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a3e2f71
    • S
      [NET]: minor net_rx_action optimization · 8aca8a27
      Stephen Hemminger 提交于
      The functions list_del followed by list_add_tail is equivalent to the
      existing inline list_move_tail. list_move_tail avoids unnecessary
      _LIST_POISON.
      Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8aca8a27
    • M
      [NET]: Move destructor from neigh->ops to neigh_params · c5ecd62c
      Michael S. Tsirkin 提交于
      struct neigh_ops currently has a destructor field, which no in-kernel
      drivers outside of infiniband use.  The infiniband/ulp/ipoib in-tree
      driver stashes some info in the neighbour structure (the results of
      the second-stage lookup from ARP results to real link-level path), and
      it uses neigh->ops->destructor to get a callback so it can clean up
      this extra info when a neighbour is freed.  We've run into problems
      with this: since the destructor is in an ops field that is shared
      between neighbours that may belong to different net devices, there's
      no way to set/clear it safely.
      
      The following patch moves this field to neigh_parms where it can be
      safely set, together with its twin neigh_setup.  Two additional
      patches in the patch series update ipoib to use this new interface.
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5ecd62c
    • L
      [PKTGEN]: Updates version. · 53dcb0e3
      Luiz Capitulino 提交于
      Due to the thread's lock changes, we're at a new version now.
      Signed-off-by: NLuiz Capitulino <lcapitulino@mandriva.com.br>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53dcb0e3