1. 15 8月, 2012 6 次提交
  2. 31 7月, 2012 1 次提交
  3. 21 7月, 2012 1 次提交
  4. 20 7月, 2012 1 次提交
    • E
      ipv4: tcp: remove per net tcp_sock · be9f4a44
      Eric Dumazet 提交于
      tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
      per network namespace.
      
      This leads to bad behavior on multiqueue NICS, because many cpus
      contend for the socket lock and once socket lock is acquired, extra
      false sharing on various socket fields slow down the operations.
      
      To better resist to attacks, we use a percpu socket. Each cpu can
      run without contention, using appropriate memory (local node)
      
      Additional features :
      
      1) We also mirror the queue_mapping of the incoming skb, so that
      answers use the same queue if possible.
      
      2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()
      
      3) We now limit the number of in-flight RST/ACK [1] packets
      per cpu, instead of per namespace, and we honor the sysctl_wmem_default
      limit dynamically. (Prior to this patch, sysctl_wmem_default value was
      copied at boot time, so any further change would not affect tcp_sock
      limit)
      
      [1] These packets are only generated when no socket was matched for
      the incoming packet.
      Reported-by: NBill Sommerfeld <wsommerfeld@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be9f4a44
  5. 11 7月, 2012 1 次提交
    • D
      tcp: Maintain dynamic metrics in local cache. · 51c5d0c4
      David S. Miller 提交于
      Maintain a local hash table of TCP dynamic metrics blobs.
      
      Computed TCP metrics are no longer maintained in the route metrics.
      
      The table uses RCU and an extremely simple hash so that it has low
      latency and low overhead.  A simple hash is legitimate because we only
      make metrics blobs for fully established connections.
      
      Some tweaking of the default hash table sizes, metric timeouts, and
      the hash chain length limit certainly could use some tweaking.  But
      the basic design seems sound.
      
      With help from Eric Dumazet and Joe Perches.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51c5d0c4
  6. 06 7月, 2012 1 次提交
  7. 09 6月, 2012 1 次提交
  8. 07 6月, 2012 7 次提交
  9. 09 5月, 2012 1 次提交
    • E
      netfilter: nf_ct_helper: allow to disable automatic helper assignment · a9006892
      Eric Leblond 提交于
      This patch allows you to disable automatic conntrack helper
      lookup based on TCP/UDP ports, eg.
      
      echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper
      
      [ Note: flows that already got a helper will keep using it even
        if automatic helper assignment has been disabled ]
      
      Once this behaviour has been disabled, you have to explicitly
      use the iptables CT target to attach helper to flows.
      
      There are good reasons to stop supporting automatic helper
      assignment, for further information, please read:
      
      http://www.netfilter.org/news.html#2012-04-03
      
      This patch also adds one message to inform that automatic helper
      assignment is deprecated and it will be removed soon (this is
      spotted only once, with the first flow that gets a helper attached
      to make it as less annoying as possible).
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a9006892
  10. 21 4月, 2012 1 次提交
  11. 16 4月, 2012 1 次提交
  12. 05 3月, 2012 1 次提交
    • P
      BUG: headers with BUG/BUG_ON etc. need linux/bug.h · 187f1882
      Paul Gortmaker 提交于
      If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
      other BUG variant in a static inline (i.e. not in a #define) then
      that header really should be including <linux/bug.h> and not just
      expecting it to be implicitly present.
      
      We can make this change risk-free, since if the files using these
      headers didn't have exposure to linux/bug.h already, they would have
      been causing compile failures/warnings.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      187f1882
  13. 28 1月, 2012 1 次提交
  14. 13 12月, 2011 1 次提交
  15. 12 12月, 2011 1 次提交
  16. 22 11月, 2011 1 次提交
    • P
      netfilter: nf_conntrack: make event callback registration per-netns · 70e9942f
      Pablo Neira Ayuso 提交于
      This patch fixes an oops that can be triggered following this recipe:
      
      0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
      1) container is started.
      2) connect to it via lxc-console.
      3) generate some traffic with the container to create some conntrack
         entries in its table.
      4) stop the container: you hit one oops because the conntrack table
         cleanup tries to report the destroy event to user-space but the
         per-netns nfnetlink socket has already gone (as the nfnetlink
         socket is per-netns but event callback registration is global).
      
      To fix this situation, we make the ctnl_notifier per-netns so the
      callback is registered/unregistered if the container is
      created/destroyed.
      
      Alex Bligh and Alexey Dobriyan originally proposed one small patch to
      check if the nfnetlink socket is gone in nfnetlink_has_listeners,
      but this is a very visited path for events, thus, it may reduce
      performance and it looks a bit hackish to check for the nfnetlink
      socket only to workaround this situation. As a result, I decided
      to follow the bigger path choice, which seems to look nicer to me.
      
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Reported-by: NAlex Bligh <alex@alex.org.uk>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      70e9942f
  17. 14 11月, 2011 1 次提交
    • E
      ipv6: reduce percpu needs for icmpv6msg mibs · 2a24444f
      Eric Dumazet 提交于
      Reading /proc/net/snmp6 on a machine with a lot of cpus is very
      expensive (can be ~88000 us).
      
      This is because ICMPV6MSG MIB uses 4096 bytes per cpu, and folding
      values for all possible cpus can read 16 Mbytes of memory (32MBytes on
      non x86 arches)
      
      ICMP messages are not considered as fast path on a typical server, and
      eventually few cpus handle them anyway. We can afford an atomic
      operation instead of using percpu data.
      
      This saves 4096 bytes per cpu and per network namespace.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a24444f
  18. 10 11月, 2011 1 次提交
    • E
      ipv4: reduce percpu needs for icmpmsg mibs · acb32ba3
      Eric Dumazet 提交于
      Reading /proc/net/snmp on a machine with a lot of cpus is very expensive
      (can be ~88000 us).
      
      This is because ICMPMSG MIB uses 4096 bytes per cpu, and folding values
      for all possible cpus can read 16 Mbytes of memory.
      
      ICMP messages are not considered as fast path on a typical server, and
      eventually few cpus handle them anyway. We can afford an atomic
      operation instead of using percpu data.
      
      This saves 4096 bytes per cpu and per network namespace.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acb32ba3
  19. 27 7月, 2011 1 次提交
  20. 14 5月, 2011 1 次提交
    • V
      net: ipv4: add IPPROTO_ICMP socket kind · c319b4d7
      Vasiliy Kulikov 提交于
      This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
      ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
      without any special privileges.  In other words, the patch makes it
      possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
      order not to increase the kernel's attack surface, the new functionality
      is disabled by default, but is enabled at bootup by supporting Linux
      distributions, optionally with restriction to a group or a group range
      (see below).
      
      Similar functionality is implemented in Mac OS X:
      http://www.manpagez.com/man/4/icmp/
      
      A new ping socket is created with
      
          socket(PF_INET, SOCK_DGRAM, PROT_ICMP)
      
      Message identifiers (octets 4-5 of ICMP header) are interpreted as local
      ports. Addresses are stored in struct sockaddr_in. No port numbers are
      reserved for privileged processes, port 0 is reserved for API ("let the
      kernel pick a free number"). There is no notion of remote ports, remote
      port numbers provided by the user (e.g. in connect()) are ignored.
      
      Data sent and received include ICMP headers. This is deliberate to:
      1) Avoid the need to transport headers values like sequence numbers by
      other means.
      2) Make it easier to port existing programs using raw sockets.
      
      ICMP headers given to send() are checked and sanitized. The type must be
      ICMP_ECHO and the code must be zero (future extensions might relax this,
      see below). The id is set to the number (local port) of the socket, the
      checksum is always recomputed.
      
      ICMP reply packets received from the network are demultiplexed according
      to their id's, and are returned by recv() without any modifications.
      IP header information and ICMP errors of those packets may be obtained
      via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
      quenches and redirects are reported as fake errors via the error queue
      (IP_RECVERR); the next hop address for redirects is saved to ee_info (in
      network order).
      
      socket(2) is restricted to the group range specified in
      "/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
      that nobody (not even root) may create ping sockets.  Setting it to "100
      100" would grant permissions to the single group (to either make
      /sbin/ping g+s and owned by this group or to grant permissions to the
      "netadmins" group), "0 4294967295" would enable it for the world, "100
      4294967295" would enable it for the users, but not daemons.
      
      The existing code might be (in the unlikely case anyone needs it)
      extended rather easily to handle other similar pairs of ICMP messages
      (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
      etc.).
      
      Userspace ping util & patch for it:
      http://openwall.info/wiki/people/segoon/ping
      
      For Openwall GNU/*/Linux it was the last step on the road to the
      setuid-less distro.  A revision of this patch (for RHEL5/OpenVZ kernels)
      is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
      http://mirrors.kernel.org/openwall/Owl/current/iso/
      
      Initially this functionality was written by Pavel Kankovsky for
      Linux 2.4.32, but unfortunately it was never made public.
      
      All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
      the patch.
      
      PATCH v3:
          - switched to flowi4.
          - minor changes to be consistent with raw sockets code.
      
      PATCH v2:
          - changed ping_debug() to pr_debug().
          - removed CONFIG_IP_PING.
          - removed ping_seq_fops.owner field (unused for procfs).
          - switched to proc_net_fops_create().
          - switched to %pK in seq_printf().
      
      PATCH v1:
          - fixed checksumming bug.
          - CAP_NET_RAW may not create icmp sockets anymore.
      
      RFC v2:
          - minor cleanups.
          - introduced sysctl'able group range to restrict socket(2).
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c319b4d7
  21. 25 3月, 2011 1 次提交
  22. 15 3月, 2011 1 次提交
  23. 19 1月, 2011 1 次提交
    • P
      netfilter: nf_conntrack_tstamp: add flow-based timestamp extension · a992ca2a
      Pablo Neira Ayuso 提交于
      This patch adds flow-based timestamping for conntracks. This
      conntrack extension is disabled by default. Basically, we use
      two 64-bits variables to store the creation timestamp once the
      conntrack has been confirmed and the other to store the deletion
      time. This extension is disabled by default, to enable it, you
      have to:
      
      echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp
      
      This patch allows to save memory for user-space flow-based
      loogers such as ulogd2. In short, ulogd2 does not need to
      keep a hashtable with the conntrack in user-space to know
      when they were created and destroyed, instead we use the
      kernel timestamp. If we want to have a sane IPFIX implementation
      in user-space, this nanosecs resolution timestamps are also
      useful. Other custom user-space applications can benefit from
      this via libnetfilter_conntrack.
      
      This patch modifies the /proc output to display the delta time
      in seconds since the flow start. You can also obtain the
      flow-start date by means of the conntrack-tools.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      a992ca2a
  24. 14 1月, 2011 1 次提交
  25. 13 1月, 2011 5 次提交