1. 04 1月, 2006 8 次提交
    • C
      [NET]: Add a dev_ioctl() fallback to sock_ioctl() · b5e5fa5e
      Christoph Hellwig 提交于
      Currently all network protocols need to call dev_ioctl as the default
      fallback in their ioctl implementations.  This patch adds a fallback
      to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD.
      This way all the procotol ioctl handlers can be simplified and we don't
      need to export dev_ioctl.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5e5fa5e
    • B
      [NET]: Speed up __alloc_skb() · 4947d3ef
      Benjamin LaHaise 提交于
      From: Benjamin LaHaise <bcrl@kvack.org>
      
      In __alloc_skb(), the use of skb_shinfo() which casts a u8 * to the 
      shared info structure results in gcc being forced to do a reload of the 
      pointer since it has no information on possible aliasing.  Fix this by 
      using a pointer to refer to skb_shared_info.
      
      By initializing skb_shared_info sequentially, the write combining buffers 
      can reduce the number of memory transactions to a single write.  Reorder 
      the initialization in __alloc_skb() to match the structure definition.  
      There is also an alignment issue on 64 bit systems with skb_shared_info 
      by converting nr_frags to a short everything packs up nicely.
      
      Also, pass the slab cache pointer according to the fclone flag instead 
      of using two almost identical function calls.
      
      This raises bw_unix performance up to a peak of 707KB/s when combined 
      with the spinlock patch.  It should help other networking protocols, too.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4947d3ef
    • A
      [INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h · 14c85021
      Arnaldo Carvalho de Melo 提交于
      To help in reducing the number of include dependencies, several files were
      touched as they were getting needed headers indirectly for stuff they use.
      
      Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
      linux/dccp.h include twice.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14c85021
    • J
      [PKTGEN]: Deinitialise static variables. · f34fbb97
      Jaco Kroon 提交于
      static variables should not be explicitly initialised to 0.  This causes
      them to be placed in .data instead of .bss.  This patch de-initialises 3
      static variables in net/core/pktgen.c.
      
      There are approximately 800 more such variables in the source tree
      (2.6.15rc5).  If there is more interrest I'd be willing to track down the
      rest of these as well and de-initialise them as well.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f34fbb97
    • A
      [TWSK]: Introduce struct timewait_sock_ops · 6d6ee43e
      Arnaldo Carvalho de Melo 提交于
      So that we can share several timewait sockets related functions and
      make the timewait mini sockets infrastructure closer to the request
      mini sockets one.
      
      Next changesets will take advantage of this, moving more code out of
      TCP and DCCP v4 and v6 to common infrastructure.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d6ee43e
    • B
      [NET]: Avoid atomic xchg() for non-error case · c1cbe4b7
      Benjamin LaHaise 提交于
      It also looks like there were 2 places where the test on sk_err was
      missing from the event wait logic (in sk_stream_wait_connect and
      sk_stream_wait_memory), while the rest of the sock_error() users look
      to be doing the right thing.  This version of the patch fixes those,
      and cleans up a few places that were testing ->sk_err directly.
      Signed-off-by: NBenjamin LaHaise <benjamin.c.lahaise@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1cbe4b7
    • H
      [IP]: Simplify and consolidate MSG_PEEK error handling · 3305b80c
      Herbert Xu 提交于
      When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
      it is left on the socket receive queue.  This means that when we detect
      a checksum error we have to be careful when trying to free the packet
      as someone could have dequeued it in the time being.
      
      Currently this delicate logic is duplicated three times between UDPv4,
      UDPv6 and RAWv6.  This patch moves them into a one place and simplifies
      the code somewhat.
      
      This is based on a suggestion by Eric Dumazet.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3305b80c
    • T
      [LSM-IPSec]: Security association restriction. · df71837d
      Trent Jaeger 提交于
      This patch series implements per packet access control via the
      extension of the Linux Security Modules (LSM) interface by hooks in
      the XFRM and pfkey subsystems that leverage IPSec security
      associations to label packets.  Extensions to the SELinux LSM are
      included that leverage the patch for this purpose.
      
      This patch implements the changes necessary to the XFRM subsystem,
      pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
      socket to use only authorized security associations (or no security
      association) to send/receive network packets.
      
      Patch purpose:
      
      The patch is designed to enable access control per packets based on
      the strongly authenticated IPSec security association.  Such access
      controls augment the existing ones based on network interface and IP
      address.  The former are very coarse-grained, and the latter can be
      spoofed.  By using IPSec, the system can control access to remote
      hosts based on cryptographic keys generated using the IPSec mechanism.
      This enables access control on a per-machine basis or per-application
      if the remote machine is running the same mechanism and trusted to
      enforce the access control policy.
      
      Patch design approach:
      
      The overall approach is that policy (xfrm_policy) entries set by
      user-level programs (e.g., setkey for ipsec-tools) are extended with a
      security context that is used at policy selection time in the XFRM
      subsystem to restrict the sockets that can send/receive packets via
      security associations (xfrm_states) that are built from those
      policies.
      
      A presentation available at
      www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
      from the SELinux symposium describes the overall approach.
      
      Patch implementation details:
      
      On output, the policy retrieved (via xfrm_policy_lookup or
      xfrm_sk_policy_lookup) must be authorized for the security context of
      the socket and the same security context is required for resultant
      security association (retrieved or negotiated via racoon in
      ipsec-tools).  This is enforced in xfrm_state_find.
      
      On input, the policy retrieved must also be authorized for the socket
      (at __xfrm_policy_check), and the security context of the policy must
      also match the security association being used.
      
      The patch has virtually no impact on packets that do not use IPSec.
      The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
      before.
      
      Also, if IPSec is used without security contexts, the impact is
      minimal.  The LSM must allow such policies to be selected for the
      combination of socket and remote machine, but subsequent IPSec
      processing proceeds as in the original case.
      
      Testing:
      
      The pfkey interface is tested using the ipsec-tools.  ipsec-tools have
      been modified (a separate ipsec-tools patch is available for version
      0.5) that supports assignment of xfrm_policy entries and security
      associations with security contexts via setkey and the negotiation
      using the security contexts via racoon.
      
      The xfrm_user interface is tested via ad hoc programs that set
      security contexts.  These programs are also available from me, and
      contain programs for setting, getting, and deleting policy for testing
      this interface.  Testing of sa functions was done by tracing kernel
      behavior.
      Signed-off-by: NTrent Jaeger <tjaeger@cse.psu.edu>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df71837d
  2. 28 12月, 2005 1 次提交
  3. 09 12月, 2005 1 次提交
  4. 06 12月, 2005 1 次提交
  5. 21 11月, 2005 1 次提交
  6. 11 11月, 2005 1 次提交
    • H
      [NET]: Detect hardware rx checksum faults correctly · fb286bb2
      Herbert Xu 提交于
      Here is the patch that introduces the generic skb_checksum_complete
      which also checks for hardware RX checksum faults.  If that happens,
      it'll call netdev_rx_csum_fault which currently prints out a stack
      trace with the device name.  In future it can turn off RX checksum.
      
      I've converted every spot under net/ that does RX checksum checks to
      use skb_checksum_complete or __skb_checksum_complete with the
      exceptions of:
      
      * Those places where checksums are done bit by bit.  These will call
      netdev_rx_csum_fault directly.
      
      * The following have not been completely checked/converted:
      
      ipmr
      ip_vs
      netfilter
      dccp
      
      This patch is based on patches and suggestions from Stephen Hemminger
      and David S. Miller.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb286bb2
  7. 10 11月, 2005 3 次提交
    • T
      9ac4a169
    • T
      [NETLINK]: Make netlink_callback->done() optional · a8f74b22
      Thomas Graf 提交于
      Most netlink families make no use of the done() callback, making
      it optional gets rid of all unnecessary dummy implementations.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8f74b22
    • Y
      [NETFILTER]: Add nf_conntrack subsystem. · 9fb9cbb1
      Yasuyuki Kozakai 提交于
      The existing connection tracking subsystem in netfilter can only
      handle ipv4.  There were basically two choices present to add
      connection tracking support for ipv6.  We could either duplicate all
      of the ipv4 connection tracking code into an ipv6 counterpart, or (the
      choice taken by these patches) we could design a generic layer that
      could handle both ipv4 and ipv6 and thus requiring only one sub-protocol
      (TCP, UDP, etc.) connection tracking helper module to be written.
      
      In fact nf_conntrack is capable of working with any layer 3
      protocol.
      
      The existing ipv4 specific conntrack code could also not deal
      with the pecularities of doing connection tracking on ipv6,
      which is also cured here.  For example, these issues include:
      
      1) ICMPv6 handling, which is used for neighbour discovery in
         ipv6 thus some messages such as these should not participate
         in connection tracking since effectively they are like ARP
         messages
      
      2) fragmentation must be handled differently in ipv6, because
         the simplistic "defrag, connection track and NAT, refrag"
         (which the existing ipv4 connection tracking does) approach simply
         isn't feasible in ipv6
      
      3) ipv6 extension header parsing must occur at the correct spots
         before and after connection tracking decisions, and there were
         no provisions for this in the existing connection tracking
         design
      
      4) ipv6 has no need for stateful NAT
      
      The ipv4 specific conntrack layer is kept around, until all of
      the ipv4 specific conntrack helpers are ported over to nf_conntrack
      and it is feature complete.  Once that occurs, the old conntrack
      stuff will get placed into the feature-removal-schedule and we will
      fully kill it off 6 months later.
      Signed-off-by: NYasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
      Signed-off-by: NHarald Welte <laforge@netfilter.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      9fb9cbb1
  8. 09 11月, 2005 1 次提交
  9. 06 11月, 2005 1 次提交
    • H
      [NET]: Fix race condition in sk_stream_wait_connect · 6151b31c
      Herbert Xu 提交于
      When sk_stream_wait_connect detects a state transition to ESTABLISHED
      or CLOSE_WAIT prior to it going to sleep, it will return without
      calling finish_wait and decrementing sk_write_pending.
      
      This may result in crashes and other unintended behaviour.
      
      The fix is to always call finish_wait and update sk_write_pending since
      it is safe to do so even if the wait entry is no longer on the queue.
      
      This bug was tracked down with the help of Alex Sidorenko and the
      fix is also based on his suggestion.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      6151b31c
  10. 03 11月, 2005 1 次提交
  11. 29 10月, 2005 1 次提交
    • A
      [IPv4/IPv6]: UFO Scatter-gather approach · e89e9cf5
      Ananda Raju 提交于
      Attached is kernel patch for UDP Fragmentation Offload (UFO) feature.
      
      1. This patch incorporate the review comments by Jeff Garzik.
      2. Renamed USO as UFO (UDP Fragmentation Offload)
      3. udp sendfile support with UFO
      
      This patches uses scatter-gather feature of skb to generate large UDP
      datagram. Below is a "how-to" on changes required in network device
      driver to use the UFO interface.
      
      UDP Fragmentation Offload (UFO) Interface:
      -------------------------------------------
      UFO is a feature wherein the Linux kernel network stack will offload the
      IP fragmentation functionality of large UDP datagram to hardware. This
      will reduce the overhead of stack in fragmenting the large UDP datagram to
      MTU sized packets
      
      1) Drivers indicate their capability of UFO using
      dev->features |= NETIF_F_UFO | NETIF_F_HW_CSUM | NETIF_F_SG
      
      NETIF_F_HW_CSUM is required for UFO over ipv6.
      
      2) UFO packet will be submitted for transmission using driver xmit routine.
      UFO packet will have a non-zero value for
      
      "skb_shinfo(skb)->ufo_size"
      
      skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
      fragment going out of the adapter after IP fragmentation by hardware.
      
      skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
      contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
      indicating that hardware has to do checksum calculation. Hardware should
      compute the UDP checksum of complete datagram and also ip header checksum of
      each fragmented IP packet.
      
      For IPV6 the UFO provides the fragment identification-id in
      skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating
      IPv6 fragments.
      Signed-off-by: NAnanda Raju <ananda.raju@neterion.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (forwarded)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      e89e9cf5
  12. 28 10月, 2005 1 次提交
  13. 27 10月, 2005 1 次提交
    • J
      [PATCH] kill massive wireless-related log spam · 35848e04
      Jeff Garzik 提交于
      Although this message is having the intended effect of causing wireless
      driver maintainers to upgrade their code, I never should have merged this
      patch in its present form.  Leading to tons of bug reports and unhappy
      users.
      
      Some wireless apps poll for statistics regularly, which leads to a printk()
      every single time they ask for stats.  That's a little bit _too_ much of a
      reminder that the driver is using an old API.
      
      Change this to printing out the message once, per kernel boot.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      35848e04
  14. 26 10月, 2005 6 次提交
  15. 23 10月, 2005 4 次提交
    • H
      [NEIGH] Fix timer leak in neigh_changeaddr · 49636bb1
      Herbert Xu 提交于
      neigh_changeaddr attempts to delete neighbour timers without setting
      nud_state.  This doesn't work because the timer may have already fired
      when we acquire the write lock in neigh_changeaddr.  The result is that
      the timer may keep firing for quite a while until the entry reaches
      NEIGH_FAILED.
      
      It should be setting the nud_state straight away so that if the timer
      has already fired it can simply exit once we relinquish the lock.
      
      In fact, this whole function is simply duplicating the logic in
      neigh_ifdown which in turn is already doing the right thing when
      it comes to deleting timers and setting nud_state.
      
      So all we have to do is take that code out and put it into a common
      function and make both neigh_changeaddr and neigh_ifdown call it.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      49636bb1
    • H
      [NEIGH] Fix add_timer race in neigh_add_timer · 6fb9974f
      Herbert Xu 提交于
      neigh_add_timer cannot use add_timer unconditionally.  The reason is that
      by the time it has obtained the write lock someone else (e.g., neigh_update)
      could have already added a new timer.
      
      So it should only use mod_timer and deal with its return value accordingly.
      
      This bug would have led to rare neighbour cache entry leaks.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      6fb9974f
    • H
      [NEIGH] Print stack trace in neigh_add_timer · 20375502
      Herbert Xu 提交于
      Stack traces are very helpful in determining the exact nature of a bug.
      So let's print a stack trace when the timer is added twice.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      20375502
    • J
      [SK_BUFF]: ipvs_property field must be copied · c98d80ed
      Julian Anastasov 提交于
      IPVS used flag NFC_IPVS_PROPERTY in nfcache but as now nfcache was removed the
      new flag 'ipvs_property' still needs to be copied. This patch should be
      included in 2.6.14.
      
      Further comments from Harald Welte:
      
      Sorry, seems like the bug was introduced by me.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NHarald Welte <laforge@netfilter.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      c98d80ed
  16. 09 10月, 2005 1 次提交
  17. 04 10月, 2005 3 次提交
  18. 29 9月, 2005 1 次提交
  19. 28 9月, 2005 3 次提交
    • F
      [NET]: Fix module reference counts for loadable protocol modules · a79af59e
      Frank Filz 提交于
      I have been experimenting with loadable protocol modules, and ran into
      several issues with module reference counting.
      
      The first issue was that __module_get failed at the BUG_ON check at
      the top of the routine (checking that my module reference count was
      not zero) when I created the first socket. When sk_alloc() is called,
      my module reference count was still 0. When I looked at why sctp
      didn't have this problem, I discovered that sctp creates a control
      socket during module init (when the module ref count is not 0), which
      keeps the reference count non-zero. This section has been updated to
      address the point Stephen raised about checking the return value of
      try_module_get().
      
      The next problem arose when my socket init routine returned an error.
      This resulted in my module reference count being decremented below 0.
      My socket ops->release routine was also being called. The issue here
      is that sock_release() calls the ops->release routine and decrements
      the ref count if sock->ops is not NULL. Since the socket probably
      didn't get correctly initialized, this should not be done, so we will
      set sock->ops to NULL because we will not call try_module_get().
      
      While searching for another bug, I also noticed that sys_accept() has
      a possibility of doing a module_put() when it did not do an
      __module_get so I re-ordered the call to security_socket_accept().
      Signed-off-by: NFrank Filz <ffilzlnx@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a79af59e
    • E
      [NET]: Prefetch dev->qdisc_lock in dev_queue_xmit() · 2d7ceece
      Eric Dumazet 提交于
      We know the lock is going to be taken.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d7ceece
    • D
      [NET]: Use non-recursive algorithm in skb_copy_datagram_iovec() · bc8dfcb9
      Daniel Phillips 提交于
      Use iteration instead of recursion.  Fraglists within fraglists
      should never occur, so we BUG check this.
      Signed-off-by: NDaniel Phillips <phillips@istop.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc8dfcb9