1. 26 4月, 2007 2 次提交
    • E
      [NET]: convert network timestamps to ktime_t · b7aa0bf7
      Eric Dumazet 提交于
      We currently use a special structure (struct skb_timeval) and plain
      'struct timeval' to store packet timestamps in sk_buffs and struct
      sock.
      
      This has some drawbacks :
      - Fixed resolution of micro second.
      - Waste of space on 64bit platforms where sizeof(struct timeval)=16
      
      I suggest using ktime_t that is a nice abstraction of high resolution
      time services, currently capable of nanosecond resolution.
      
      As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
      a 8 byte shrink of this structure on 64bit architectures. Some other
      structures also benefit from this size reduction (struct ipq in
      ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)
      
      Once this ktime infrastructure adopted, we can more easily provide
      nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
      SO_TIMESTAMPNS/SCM_TIMESTAMPNS)
      
      Note : this patch includes a bug correction in
      compat_sock_get_timestamp() where a "err = 0;" was missing (so this
      syscall returned -ENOENT instead of 0)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      CC: Stephen Hemminger <shemminger@linux-foundation.org>
      CC: John find <linux.kernel@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7aa0bf7
    • E
      [NET]: Keep sk_backlog near sk_lock · fa438ccf
      Eric Dumazet 提交于
      sk_backlog is a critical field of struct sock. (known famous words)
      
      It is (ab)used in hot paths, in particular in release_sock(), tcp_recvmsg(),
      tcp_v4_rcv(), sk_receive_skb().
      
      It really makes sense to place it next to sk_lock, because sk_backlog is only
      used after sk_lock locked (and thus memory cache line in L1 cache). This
      should reduce cache misses and sk_lock acquisition time.
      
      (In theory, we could only move the head pointer near sk_lock, and leaving tail
      far away, because 'tail' is normally not so hot, but keep it simple :) )
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa438ccf
  2. 23 3月, 2007 1 次提交
  3. 07 3月, 2007 1 次提交
  4. 13 2月, 2007 1 次提交
  5. 11 2月, 2007 1 次提交
  6. 08 12月, 2006 2 次提交
  7. 04 12月, 2006 1 次提交
  8. 03 12月, 2006 2 次提交
  9. 06 11月, 2006 1 次提交
  10. 11 10月, 2006 1 次提交
  11. 23 9月, 2006 3 次提交
  12. 04 7月, 2006 2 次提交
  13. 01 7月, 2006 1 次提交
  14. 30 6月, 2006 1 次提交
    • C
      [AF_UNIX]: Datagram getpeersec · 877ce7c1
      Catherine Zhang 提交于
      This patch implements an API whereby an application can determine the
      label of its peer's Unix datagram sockets via the auxiliary data mechanism of
      recvmsg.
      
      Patch purpose:
      
      This patch enables a security-aware application to retrieve the
      security context of the peer of a Unix datagram socket.  The application
      can then use this security context to determine the security context for
      processing on behalf of the peer who sent the packet.
      
      Patch design and implementation:
      
      The design and implementation is very similar to the UDP case for INET
      sockets.  Basically we build upon the existing Unix domain socket API for
      retrieving user credentials.  Linux offers the API for obtaining user
      credentials via ancillary messages (i.e., out of band/control messages
      that are bundled together with a normal message).  To retrieve the security
      context, the application first indicates to the kernel such desire by
      setting the SO_PASSSEC option via getsockopt.  Then the application
      retrieves the security context using the auxiliary data mechanism.
      
      An example server application for Unix datagram socket should look like this:
      
      toggle = 1;
      toggle_len = sizeof(toggle);
      
      setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
      recvmsg(sockfd, &msg_hdr, 0);
      if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
          cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
          if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
              cmsg_hdr->cmsg_level == SOL_SOCKET &&
              cmsg_hdr->cmsg_type == SCM_SECURITY) {
              memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
          }
      }
      
      sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
      a server socket to receive security context of the peer.
      
      Testing:
      
      We have tested the patch by setting up Unix datagram client and server
      applications.  We verified that the server can retrieve the security context
      using the auxiliary data mechanism of recvmsg.
      Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com>
      Acked-by: NAcked-by: James Morris <jmorris@namei.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      877ce7c1
  15. 18 6月, 2006 1 次提交
  16. 31 3月, 2006 1 次提交
  17. 29 3月, 2006 1 次提交
    • D
      [NET]: deinline 200+ byte inlines in sock.h · f0088a50
      Denis Vlasenko 提交于
      Sizes in bytes (allyesconfig, i386) and files where those inlines
      are used:
      
      238 sock_queue_rcv_skb 2.6.16/net/x25/x25_in.o
      238 sock_queue_rcv_skb 2.6.16/net/rose/rose_in.o
      238 sock_queue_rcv_skb 2.6.16/net/packet/af_packet.o
      238 sock_queue_rcv_skb 2.6.16/net/netrom/nr_in.o
      238 sock_queue_rcv_skb 2.6.16/net/llc/llc_sap.o
      238 sock_queue_rcv_skb 2.6.16/net/llc/llc_conn.o
      238 sock_queue_rcv_skb 2.6.16/net/irda/af_irda.o
      238 sock_queue_rcv_skb 2.6.16/net/ipx/af_ipx.o
      238 sock_queue_rcv_skb 2.6.16/net/ipv6/udp.o
      238 sock_queue_rcv_skb 2.6.16/net/ipv6/raw.o
      238 sock_queue_rcv_skb 2.6.16/net/ipv4/udp.o
      238 sock_queue_rcv_skb 2.6.16/net/ipv4/raw.o
      238 sock_queue_rcv_skb 2.6.16/net/ipv4/ipmr.o
      238 sock_queue_rcv_skb 2.6.16/net/econet/econet.o
      238 sock_queue_rcv_skb 2.6.16/net/econet/af_econet.o
      238 sock_queue_rcv_skb 2.6.16/net/bluetooth/sco.o
      238 sock_queue_rcv_skb 2.6.16/net/bluetooth/l2cap.o
      238 sock_queue_rcv_skb 2.6.16/net/bluetooth/hci_sock.o
      238 sock_queue_rcv_skb 2.6.16/net/ax25/ax25_in.o
      238 sock_queue_rcv_skb 2.6.16/net/ax25/af_ax25.o
      238 sock_queue_rcv_skb 2.6.16/net/appletalk/ddp.o
      238 sock_queue_rcv_skb 2.6.16/drivers/net/pppoe.o
      
      276 sk_receive_skb 2.6.16/net/decnet/dn_nsp_in.o
      276 sk_receive_skb 2.6.16/net/dccp/ipv6.o
      276 sk_receive_skb 2.6.16/net/dccp/ipv4.o
      276 sk_receive_skb 2.6.16/net/dccp/dccp_ipv6.o
      276 sk_receive_skb 2.6.16/drivers/net/pppoe.o
      
      209 sk_dst_check 2.6.16/net/ipv6/ip6_output.o
      209 sk_dst_check 2.6.16/net/ipv4/udp.o
      209 sk_dst_check 2.6.16/net/decnet/dn_nsp_out.o
      
      Large inlines with multiple callers:
      Size  Uses Wasted Name and definition
      ===== ==== ====== ================================================
        238   21   4360 sock_queue_rcv_skb    include/net/sock.h
        109   10    801 sock_recv_timestamp   include/net/sock.h
        276    4    768 sk_receive_skb        include/net/sock.h
         94    8    518 __sk_dst_check        include/net/sock.h
        209    3    378 sk_dst_check  include/net/sock.h
        131    4    333 sk_setup_caps include/net/sock.h
        152    2    132 sk_stream_alloc_pskb  include/net/sock.h
        125    2    105 sk_stream_writequeue_purge    include/net/sock.h
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0088a50
  18. 25 3月, 2006 1 次提交
  19. 21 3月, 2006 3 次提交
    • A
      [NET]: Identation & other cleanups related to compat_[gs]etsockopt cset · 543d9cfe
      Arnaldo Carvalho de Melo 提交于
      No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs
      to just after the function exported, etc.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      543d9cfe
    • D
      [NET]: {get|set}sockopt compatibility layer · 3fdadf7d
      Dmitry Mishin 提交于
      This patch extends {get|set}sockopt compatibility layer in order to
      move protocol specific parts to their place and avoid huge universal
      net/compat.c file in the future.
      Signed-off-by: NDmitry Mishin <dim@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fdadf7d
    • C
      [SECURITY]: TCP/UDP getpeersec · 2c7946a7
      Catherine Zhang 提交于
      This patch implements an application of the LSM-IPSec networking
      controls whereby an application can determine the label of the
      security association its TCP or UDP sockets are currently connected to
      via getsockopt and the auxiliary data mechanism of recvmsg.
      
      Patch purpose:
      
      This patch enables a security-aware application to retrieve the
      security context of an IPSec security association a particular TCP or
      UDP socket is using.  The application can then use this security
      context to determine the security context for processing on behalf of
      the peer at the other end of this connection.  In the case of UDP, the
      security context is for each individual packet.  An example
      application is the inetd daemon, which could be modified to start
      daemons running at security contexts dependent on the remote client.
      
      Patch design approach:
      
      - Design for TCP
      The patch enables the SELinux LSM to set the peer security context for
      a socket based on the security context of the IPSec security
      association.  The application may retrieve this context using
      getsockopt.  When called, the kernel determines if the socket is a
      connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
      cache on the socket to retrieve the security associations.  If a
      security association has a security context, the context string is
      returned, as for UNIX domain sockets.
      
      - Design for UDP
      Unlike TCP, UDP is connectionless.  This requires a somewhat different
      API to retrieve the peer security context.  With TCP, the peer
      security context stays the same throughout the connection, thus it can
      be retrieved at any time between when the connection is established
      and when it is torn down.  With UDP, each read/write can have
      different peer and thus the security context might change every time.
      As a result the security context retrieval must be done TOGETHER with
      the packet retrieval.
      
      The solution is to build upon the existing Unix domain socket API for
      retrieving user credentials.  Linux offers the API for obtaining user
      credentials via ancillary messages (i.e., out of band/control messages
      that are bundled together with a normal message).
      
      Patch implementation details:
      
      - Implementation for TCP
      The security context can be retrieved by applications using getsockopt
      with the existing SO_PEERSEC flag.  As an example (ignoring error
      checking):
      
      getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen);
      printf("Socket peer context is: %s\n", optbuf);
      
      The SELinux function, selinux_socket_getpeersec, is extended to check
      for labeled security associations for connected (TCP_ESTABLISHED ==
      sk->sk_state) TCP sockets only.  If so, the socket has a dst_cache of
      struct dst_entry values that may refer to security associations.  If
      these have security associations with security contexts, the security
      context is returned.
      
      getsockopt returns a buffer that contains a security context string or
      the buffer is unmodified.
      
      - Implementation for UDP
      To retrieve the security context, the application first indicates to
      the kernel such desire by setting the IP_PASSSEC option via
      getsockopt.  Then the application retrieves the security context using
      the auxiliary data mechanism.
      
      An example server application for UDP should look like this:
      
      toggle = 1;
      toggle_len = sizeof(toggle);
      
      setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len);
      recvmsg(sockfd, &msg_hdr, 0);
      if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
          cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
          if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
              cmsg_hdr->cmsg_level == SOL_IP &&
              cmsg_hdr->cmsg_type == SCM_SECURITY) {
              memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
          }
      }
      
      ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
      a server socket to receive security context of the peer.  A new
      ancillary message type SCM_SECURITY.
      
      When the packet is received we get the security context from the
      sec_path pointer which is contained in the sk_buff, and copy it to the
      ancillary message space.  An additional LSM hook,
      selinux_socket_getpeersec_udp, is defined to retrieve the security
      context from the SELinux space.  The existing function,
      selinux_socket_getpeersec does not suit our purpose, because the
      security context is copied directly to user space, rather than to
      kernel space.
      
      Testing:
      
      We have tested the patch by setting up TCP and UDP connections between
      applications on two machines using the IPSec policies that result in
      labeled security associations being built.  For TCP, we can then
      extract the peer security context using getsockopt on either end.  For
      UDP, the receiving end can retrieve the security context using the
      auxiliary data mechanism of recvmsg.
      Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c7946a7
  20. 12 1月, 2006 1 次提交
  21. 04 1月, 2006 1 次提交
  22. 09 11月, 2005 1 次提交
  23. 28 10月, 2005 1 次提交
  24. 09 10月, 2005 1 次提交
  25. 28 9月, 2005 1 次提交
    • F
      [NET]: Fix module reference counts for loadable protocol modules · a79af59e
      Frank Filz 提交于
      I have been experimenting with loadable protocol modules, and ran into
      several issues with module reference counting.
      
      The first issue was that __module_get failed at the BUG_ON check at
      the top of the routine (checking that my module reference count was
      not zero) when I created the first socket. When sk_alloc() is called,
      my module reference count was still 0. When I looked at why sctp
      didn't have this problem, I discovered that sctp creates a control
      socket during module init (when the module ref count is not 0), which
      keeps the reference count non-zero. This section has been updated to
      address the point Stephen raised about checking the return value of
      try_module_get().
      
      The next problem arose when my socket init routine returned an error.
      This resulted in my module reference count being decremented below 0.
      My socket ops->release routine was also being called. The issue here
      is that sock_release() calls the ops->release routine and decrements
      the ref count if sock->ops is not NULL. Since the socket probably
      didn't get correctly initialized, this should not be done, so we will
      set sock->ops to NULL because we will not call try_module_get().
      
      While searching for another bug, I also noticed that sys_accept() has
      a possibility of doing a module_put() when it did not do an
      __module_get so I re-ordered the call to security_socket_accept().
      Signed-off-by: NFrank Filz <ffilzlnx@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a79af59e
  26. 07 9月, 2005 2 次提交
    • P
      [NET]: proto_unregister: fix sleeping while atomic · 0a3f4358
      Patrick McHardy 提交于
      proto_unregister holds a lock while calling kmem_cache_destroy, which
      can sleep.
      
      Noticed by Daniele Orlandi <daniele@orlandi.com>.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a3f4358
    • E
      [NET]: Make sure l_linger is unsigned to avoid negative timeouts · 9261c9b0
      Eric Dumazet 提交于
      One of my x86_64 (linux 2.6.13) server log is filled with :
      
      schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
      schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
      schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
      schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
      schedule_timeout: wrong timeout value ffffffffffffff06 from ffffffff802e63ca
      
      This is because some application does a
      
      struct linger li;
      li.l_onoff = 1;
      li.l_linger = -1;
      setsockopt(sock, SOL_SOCKET, SO_LINGER, &li, sizeof(li));
      
      And unfortunatly l_linger is defined as a 'signed int' in
      include/linux/socket.h:
      
      struct linger {
               int             l_onoff;        /* Linger active                */
               int             l_linger;       /* How long to linger for       */
      };
      
      I dont know if it's safe to change l_linger to 'unsigned int' in the
      include file (It might be defined as int in ABI specs)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9261c9b0
  27. 06 9月, 2005 1 次提交
  28. 30 8月, 2005 4 次提交