1. 10 12月, 2011 1 次提交
  2. 14 5月, 2011 1 次提交
    • V
      net: ipv4: add IPPROTO_ICMP socket kind · c319b4d7
      Vasiliy Kulikov 提交于
      This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
      ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
      without any special privileges.  In other words, the patch makes it
      possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
      order not to increase the kernel's attack surface, the new functionality
      is disabled by default, but is enabled at bootup by supporting Linux
      distributions, optionally with restriction to a group or a group range
      (see below).
      
      Similar functionality is implemented in Mac OS X:
      http://www.manpagez.com/man/4/icmp/
      
      A new ping socket is created with
      
          socket(PF_INET, SOCK_DGRAM, PROT_ICMP)
      
      Message identifiers (octets 4-5 of ICMP header) are interpreted as local
      ports. Addresses are stored in struct sockaddr_in. No port numbers are
      reserved for privileged processes, port 0 is reserved for API ("let the
      kernel pick a free number"). There is no notion of remote ports, remote
      port numbers provided by the user (e.g. in connect()) are ignored.
      
      Data sent and received include ICMP headers. This is deliberate to:
      1) Avoid the need to transport headers values like sequence numbers by
      other means.
      2) Make it easier to port existing programs using raw sockets.
      
      ICMP headers given to send() are checked and sanitized. The type must be
      ICMP_ECHO and the code must be zero (future extensions might relax this,
      see below). The id is set to the number (local port) of the socket, the
      checksum is always recomputed.
      
      ICMP reply packets received from the network are demultiplexed according
      to their id's, and are returned by recv() without any modifications.
      IP header information and ICMP errors of those packets may be obtained
      via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
      quenches and redirects are reported as fake errors via the error queue
      (IP_RECVERR); the next hop address for redirects is saved to ee_info (in
      network order).
      
      socket(2) is restricted to the group range specified in
      "/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
      that nobody (not even root) may create ping sockets.  Setting it to "100
      100" would grant permissions to the single group (to either make
      /sbin/ping g+s and owned by this group or to grant permissions to the
      "netadmins" group), "0 4294967295" would enable it for the world, "100
      4294967295" would enable it for the users, but not daemons.
      
      The existing code might be (in the unlikely case anyone needs it)
      extended rather easily to handle other similar pairs of ICMP messages
      (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
      etc.).
      
      Userspace ping util & patch for it:
      http://openwall.info/wiki/people/segoon/ping
      
      For Openwall GNU/*/Linux it was the last step on the road to the
      setuid-less distro.  A revision of this patch (for RHEL5/OpenVZ kernels)
      is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
      http://mirrors.kernel.org/openwall/Owl/current/iso/
      
      Initially this functionality was written by Pavel Kankovsky for
      Linux 2.4.32, but unfortunately it was never made public.
      
      All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
      the patch.
      
      PATCH v3:
          - switched to flowi4.
          - minor changes to be consistent with raw sockets code.
      
      PATCH v2:
          - changed ping_debug() to pr_debug().
          - removed CONFIG_IP_PING.
          - removed ping_seq_fops.owner field (unused for procfs).
          - switched to proc_net_fops_create().
          - switched to %pK in seq_printf().
      
      PATCH v1:
          - fixed checksumming bug.
          - CAP_NET_RAW may not create icmp sockets anymore.
      
      RFC v2:
          - minor cleanups.
          - introduced sysctl'able group range to restrict socket(2).
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c319b4d7
  3. 02 2月, 2011 1 次提交
    • D
      ipv4: Remove fib_hash. · 3630b7c0
      David S. Miller 提交于
      The time has finally come to remove the hash based routing table
      implementation in ipv4.
      
      FIB Trie is mature, well tested, and I've done an audit of it's code
      to confirm that it implements insert, delete, and lookup with the same
      identical semantics as fib_hash did.
      
      If there are any semantic differences found in fib_trie, we should
      simply fix them.
      
      I've placed the trie statistic config option under advanced router
      configuration.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      3630b7c0
  4. 22 8月, 2010 1 次提交
    • D
      PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol) · 00959ade
      Dmitry Kozlov 提交于
      PPP: introduce "pptp" module which implements point-to-point tunneling protocol using pppox framework
      NET: introduce the "gre" module for demultiplexing GRE packets on version criteria
           (required to pptp and ip_gre may coexists)
      NET: ip_gre: update to use the "gre" module
      
      This patch introduces then pptp support to the linux kernel which
      dramatically speeds up pptp vpn connections and decreases cpu usage in
      comparison of existing user-space implementation
      (poptop/pptpclient). There is accel-pptp project
      (https://sourceforge.net/projects/accel-pptp/) to utilize this module,
      it contains plugin for pppd to use pptp in client-mode and modified
      pptpd (poptop) to build high-performance pptp NAS.
      
      There was many changes from initial submitted patch, most important are:
      1. using rcu instead of read-write locks
      2. using static bitmap instead of dynamically allocated
      3. using vmalloc for memory allocation instead of BITS_PER_LONG + __get_free_pages
      4. fixed many coding style issues
      Thanks to Eric Dumazet.
      Signed-off-by: NDmitry Kozlov <xeb@mail.ru>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00959ade
  5. 07 10月, 2008 1 次提交
  6. 07 3月, 2008 1 次提交
    • D
      [UDP]: Revert udplite and code split. · db8dac20
      David S. Miller 提交于
      This reverts commit db1ed684 ("[IPV6]
      UDP: Rename IPv6 UDP files."), commit
      8be8af8f ("[IPV4] UDP: Move
      IPv4-specific bits to other file.") and commit
      e898d4db ("[UDP]: Allow users to
      configure UDP-Lite.").
      
      First, udplite is of such small cost, and it is a core protocol just
      like TCP and normal UDP are.
      
      We spent enormous amounts of effort to make udplite share as much code
      with core UDP as possible.  All of that work is less valuable if we're
      just going to slap a config option on udplite support.
      
      It is also causing build failures, as reported on linux-next, showing
      that the changeset was not tested very well.  In fact, this is the
      second build failure resulting from the udplite change.
      
      Finally, the config options provided was a bool, instead of a modular
      option.  Meaning the udplite code does not even get build tested
      by allmodconfig builds, and furthermore the user is not presented
      with a reasonable modular build option which is particularly needed
      by distribution vendors.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db8dac20
  7. 04 3月, 2008 2 次提交
  8. 29 1月, 2008 1 次提交
    • P
      [IPV4]: Cleanup the sysctl_net_ipv4.c file · 9ba63979
      Pavel Emelyanov 提交于
      This includes several cleanups:
      
       * tune Makefile to compile out this file when SYSCTL=n. Now
         it looks like net/core/sysctl_net_core.c one;
       * move the ipv4_config to af_inet.c to exist all the time;
       * remove additional sysctl_ip_nonlocal_bind declaration
         (it is already declared in net/ip.h);
       * remove no nonger needed ifdefs from this file.
      
      This is a preparation for using ctl paths for net/ipv4/
      sysctl table.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ba63979
  9. 16 10月, 2007 1 次提交
    • P
      [INET]: Collect frag queues management objects together · 7eb95156
      Pavel Emelyanov 提交于
      There are some objects that are common in all the places
      which are used to keep track of frag queues, they are:
      
       * hash table
       * LRU list
       * rw lock
       * rnd number for hash function
       * the number of queues
       * the amount of memory occupied by queues
       * secret timer
      
      Move all this stuff into one structure (struct inet_frags)
      to make it possible use them uniformly in the future. Like
      with the previous patch this mostly consists of hunks like
      
      -    write_lock(&ipfrag_lock);
      +    write_lock(&ip4_frags.lock);
      
      To address the issue with exporting the number of queues and
      the amount of memory occupied by queues outside the .c file
      they are declared in, I introduce a couple of helpers.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7eb95156
  10. 11 10月, 2007 1 次提交
  11. 11 7月, 2007 1 次提交
  12. 26 4月, 2007 4 次提交
  13. 03 12月, 2006 1 次提交
    • G
      [NET]: Supporting UDP-Lite (RFC 3828) in Linux · ba4e58ec
      Gerrit Renker 提交于
      This is a revision of the previously submitted patch, which alters
      the way files are organized and compiled in the following manner:
      
      	* UDP and UDP-Lite now use separate object files
      	* source file dependencies resolved via header files
      	  net/ipv{4,6}/udp_impl.h
      	* order of inclusion files in udp.c/udplite.c adapted
      	  accordingly
      
      [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828)
      
      This patch adds support for UDP-Lite to the IPv4 stack, provided as an
      extension to the existing UDPv4 code:
              * generic routines are all located in net/ipv4/udp.c
              * UDP-Lite specific routines are in net/ipv4/udplite.c
              * MIB/statistics support in /proc/net/snmp and /proc/net/udplite
              * shared API with extensions for partial checksum coverage
      
      [NET/IPv6]: Extension for UDP-Lite over IPv6
      
      It extends the existing UDPv6 code base with support for UDP-Lite
      in the same manner as per UDPv4. In particular,
              * UDPv6 generic and shared code is in net/ipv6/udp.c
              * UDP-Litev6 specific extensions are in net/ipv6/udplite.c
              * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6
              * support for IPV6_ADDRFORM
              * aligned the coding style of protocol initialisation with af_inet6.c
              * made the error handling in udpv6_queue_rcv_skb consistent;
                to return `-1' on error on all error cases
              * consolidation of shared code
      
      [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support
      
      The UDP-Lite patch further provides
              * API documentation for UDP-Lite
              * basic xfrm support
              * basic netfilter support for IPv4 and IPv6 (LOG target)
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba4e58ec
  14. 04 10月, 2006 1 次提交
  15. 23 9月, 2006 1 次提交
    • P
      [NetLabel]: CIPSOv4 engine · 446fda4f
      Paul Moore 提交于
      Add support for the Commercial IP Security Option (CIPSO) to the IPv4
      network stack.  CIPSO has become a de-facto standard for
      trusted/labeled networking amongst existing Trusted Operating Systems
      such as Trusted Solaris, HP-UX CMW, etc.  This implementation is
      designed to be used with the NetLabel subsystem to provide explicit
      packet labeling to LSM developers.
      
      The CIPSO/IPv4 packet labeling works by the LSM calling a NetLabel API
      function which attaches a CIPSO label (IPv4 option) to a given socket;
      this in turn attaches the CIPSO label to every packet leaving the
      socket without any extra processing on the outbound side.  On the
      inbound side the individual packet's sk_buff is examined through a
      call to a NetLabel API function to determine if a CIPSO/IPv4 label is
      present and if so the security attributes of the CIPSO label are
      returned to the caller of the NetLabel API function.
      Signed-off-by: NPaul Moore <paul.moore@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      446fda4f
  16. 11 7月, 2006 1 次提交
  17. 18 6月, 2006 5 次提交
  18. 29 3月, 2006 1 次提交
    • H
      [INET]: Introduce tunnel4/tunnel6 · d2acc347
      Herbert Xu 提交于
      Basically this patch moves the generic tunnel protocol stuff out of
      xfrm4_tunnel/xfrm6_tunnel and moves it into the new files of tunnel4.c
      and tunnel6 respectively.
      
      The reason for this is that the problem that Hugo uncovered is only
      the tip of the iceberg.  The real problem is that when we removed the
      dependency of ipip on xfrm4_tunnel we didn't really consider the module
      case at all.
      
      For instance, as it is it's possible to build both ipip and xfrm4_tunnel
      as modules and if the latter is loaded then ipip simply won't load.
      
      After considering the alternatives I've decided that the best way out of
      this is to restore the dependency of ipip on the non-xfrm-specific part
      of xfrm4_tunnel.  This is acceptable IMHO because the intention of the
      removal was really to be able to use ipip without the xfrm subsystem.
      This is still preserved by this patch.
      
      So now both ipip/xfrm4_tunnel depend on the new tunnel4.c which handles
      the arbitration between the two.  The order of processing is determined
      by a simple integer which ensures that ipip gets processed before
      xfrm4_tunnel.
      
      The situation for ICMP handling is a little bit more complicated since
      we may not have enough information to determine who it's for.  It's not
      a big deal at the moment since the xfrm ICMP handlers are basically
      no-ops.  In future we can deal with this when we look at ICMP caching
      in general.
      
      The user-visible change to this is the removal of the TUNNEL Kconfig
      prompts.  This makes sense because it can only be used through IPCOMP
      as it stands.
      
      The addition of the new modules shouldn't introduce any problems since
      module dependency will cause them to be loaded.
      
      Oh and I also turned some unnecessary pskb's in IPv6 related to this
      patch to skb's.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2acc347
  19. 11 1月, 2006 1 次提交
  20. 04 1月, 2006 1 次提交
  21. 30 8月, 2005 7 次提交
  22. 28 7月, 2005 1 次提交
  23. 24 6月, 2005 4 次提交
    • J
      [TCP]: Add Scalable TCP congestion control module. · 0e57976b
      John Heffner 提交于
      This patch implements Tom Kelly's Scalable TCP congestion control algorithm 
      for the modular framework.
      
      The algorithm has some nice scaling properties, and has been used a fair bit 
      in research, though is known to have significant fairness issues, so it's not 
      really suitable for general purpose use.
      Signed-off-by: NJohn Heffner <jheffner@psc.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e57976b
    • B
      [TCP]: Add H-TCP congestion control module. · a7868ea6
      Baruch Even 提交于
      H-TCP is a congestion control algorithm developed at the Hamilton Institute, by
      Douglas Leith and Robert Shorten. It is extending the standard Reno algorithm
      with mode switching is thus a relatively simple modification.
      
      H-TCP is defined in a layered manner as it is still a research platform. The
      basic form includes the modification of beta according to the ratio of maxRTT
      to min RTT and the alpha=2*factor*(1-beta) relation, where factor is dependant
      on the time since last congestion.
      
      The other layers improve convergence by adding appropriate factors to alpha.
      
      The following patch implements the H-TCP algorithm in it's basic form.
      Signed-Off-By: NBaruch Even <baruch@ev-en.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7868ea6
    • S
      [TCP]: Add TCP Vegas congestion control module. · b87d8561
      Stephen Hemminger 提交于
      TCP Vegas code modified for the new TCP infrastructure.  
      Vegas now uses microsecond resolution timestamps for 
      better estimation of performance over higher speed links.
      Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b87d8561
    • D
      [TCP]: Add TCP Hybla congestion control module. · 835b3f0c
      Daniele Lacamera 提交于
      TCP Hybla congestion avoidance.
      
      - "In heterogeneous networks, TCP connections that incorporate a
      terrestrial or satellite radio link are greatly disadvantaged with
      respect to entirely wired connections, because of their longer round
      trip times (RTTs). To cope with this problem, a new TCP proposal, the
      TCP Hybla, is presented and discussed in the paper[1]. It stems from an
      analytical evaluation of the congestion window dynamics in the TCP
      standard versions (Tahoe, Reno, NewReno), which suggests the necessary
      modifications to remove the performance dependence on RTT.[...]"[1]
      
      [1]: Carlo Caini, Rosario Firrincieli, "TCP Hybla: a TCP enhancement for
      heterogeneous networks",
      International Journal of Satellite Communications and Networking
      Volume 22, Issue 5 , Pages 547 - 566. September 2004.
      
      Signed-off-by: Daniele Lacamera (root at danielinux.net)net
      Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      835b3f0c