1. 09 7月, 2011 1 次提交
  2. 08 7月, 2011 1 次提交
    • T
      sctp: Enforce retransmission limit during shutdown · f8d96052
      Thomas Graf 提交于
      When initiating a graceful shutdown while having data chunks
      on the retransmission queue with a peer which is in zero
      window mode the shutdown is never completed because the
      retransmission error count is reset periodically by the
      following two rules:
      
       - Do not timeout association while doing zero window probe.
       - Reset overall error count when a heartbeat request has
         been acknowledged.
      
      The graceful shutdown will wait for all outstanding TSN to
      be acknowledged before sending the SHUTDOWN request. This
      never happens due to the peer's zero window not acknowledging
      the continuously retransmitted data chunks. Although the
      error counter is incremented for each failed retransmission,
      the receiving of the SACK announcing the zero window clears
      the error count again immediately. Also heartbeat requests
      continue to be sent periodically. The peer acknowledges these
      requests causing the error counter to be reset as well.
      
      This patch changes behaviour to only reset the overall error
      counter for the above rules while not in shutdown. After
      reaching the maximum number of retransmission attempts, the
      T5 shutdown guard timer is scheduled to give the receiver
      some additional time to recover. The timer is stopped as soon
      as the receiver acknowledges any data.
      
      The issue can be easily reproduced by establishing a sctp
      association over the loopback device, constantly queueing
      data at the sender while not reading any at the receiver.
      Wait for the window to reach zero, then initiate a shutdown
      by killing both processes simultaneously. The association
      will never be freed and the chunks on the retransmission
      queue will be retransmitted indefinitely.
      Signed-off-by: NThomas Graf <tgraf@infradead.org>
      Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8d96052
  3. 02 7月, 2011 1 次提交
    • D
      ipv6: Don't put artificial limit on routing table size. · 957c665f
      David S. Miller 提交于
      IPV6, unlike IPV4, doesn't have a routing cache.
      
      Routing table entries, as well as clones made in response
      to route lookup requests, all live in the same table.  And
      all of these things are together collected in the destination
      cache table for ipv6.
      
      This means that routing table entries count against the garbage
      collection limits, even though such entries cannot ever be reclaimed
      and are added explicitly by the administrator (rather than being
      created in response to lookups).
      
      Therefore it makes no sense to count ipv6 routing table entries
      against the GC limits.
      
      Add a DST_NOCOUNT destination cache entry flag, and skip the counting
      if it is set.  Use this flag bit in ipv6 when adding routing table
      entries.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      957c665f
  4. 28 6月, 2011 2 次提交
  5. 16 6月, 2011 1 次提交
  6. 13 6月, 2011 1 次提交
    • A
      Delay struct net freeing while there's a sysfs instance refering to it · a685e089
      Al Viro 提交于
      	* new refcount in struct net, controlling actual freeing of the memory
      	* new method in kobj_ns_type_operations (->drop_ns())
      	* ->current_ns() semantics change - it's supposed to be followed by
      corresponding ->drop_ns().  For struct net in case of CONFIG_NET_NS it bumps
      the new refcount; net_drop_ns() decrements it and calls net_free() if the
      last reference has been dropped.  Method renamed to ->grab_current_ns().
      	* old net_free() callers call net_drop_ns() instead.
      	* sysfs_exit_ns() is gone, along with a large part of callchain
      leading to it; now that the references stored in ->ns[...] stay valid we
      do not need to hunt them down and replace them with NULL.  That fixes
      problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
      of sb->s_instances abuse.
      
      	Note that struct net *shutdown* logics has not changed - net_cleanup()
      is called exactly when it used to be called.  The only thing postponed by
      having a sysfs instance refering to that struct net is actual freeing of
      memory occupied by struct net.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a685e089
  7. 01 6月, 2011 1 次提交
  8. 28 5月, 2011 2 次提交
  9. 27 5月, 2011 1 次提交
  10. 25 5月, 2011 5 次提交
  11. 23 5月, 2011 2 次提交
  12. 20 5月, 2011 1 次提交
    • E
      ipv6: reduce per device ICMP mib sizes · be281e55
      Eric Dumazet 提交于
      ipv6 has per device ICMP SNMP counters, taking too much space because
      they use percpu storage.
      
      needed size per device is :
      (512+4)*sizeof(long)*number_of_possible_cpus*2
      
      On a 32bit kernel, 16 possible cpus, this wastes more than 64kbytes of
      memory per ipv6 enabled network device, taken in vmalloc pool.
      
      Since ICMP messages are rare, just use shared counters (atomic_long_t)
      
      Per network space ICMP counters are still using percpu memory, we might
      also convert them to shared counters in a future patch.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Denys Fedoryshchenko <denys@visp.net.lb>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be281e55
  13. 19 5月, 2011 5 次提交
  14. 18 5月, 2011 1 次提交
  15. 17 5月, 2011 3 次提交
  16. 16 5月, 2011 8 次提交
  17. 14 5月, 2011 3 次提交
    • D
      ipv4: Remove route key identity dependencies in ip_rt_get_source(). · 8e36360a
      David S. Miller 提交于
      Pass in the sk_buff so that we can fetch the necessary keys from
      the packet header when working with input routes.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e36360a
    • V
      net: ipv4: add IPPROTO_ICMP socket kind · c319b4d7
      Vasiliy Kulikov 提交于
      This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
      ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
      without any special privileges.  In other words, the patch makes it
      possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
      order not to increase the kernel's attack surface, the new functionality
      is disabled by default, but is enabled at bootup by supporting Linux
      distributions, optionally with restriction to a group or a group range
      (see below).
      
      Similar functionality is implemented in Mac OS X:
      http://www.manpagez.com/man/4/icmp/
      
      A new ping socket is created with
      
          socket(PF_INET, SOCK_DGRAM, PROT_ICMP)
      
      Message identifiers (octets 4-5 of ICMP header) are interpreted as local
      ports. Addresses are stored in struct sockaddr_in. No port numbers are
      reserved for privileged processes, port 0 is reserved for API ("let the
      kernel pick a free number"). There is no notion of remote ports, remote
      port numbers provided by the user (e.g. in connect()) are ignored.
      
      Data sent and received include ICMP headers. This is deliberate to:
      1) Avoid the need to transport headers values like sequence numbers by
      other means.
      2) Make it easier to port existing programs using raw sockets.
      
      ICMP headers given to send() are checked and sanitized. The type must be
      ICMP_ECHO and the code must be zero (future extensions might relax this,
      see below). The id is set to the number (local port) of the socket, the
      checksum is always recomputed.
      
      ICMP reply packets received from the network are demultiplexed according
      to their id's, and are returned by recv() without any modifications.
      IP header information and ICMP errors of those packets may be obtained
      via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
      quenches and redirects are reported as fake errors via the error queue
      (IP_RECVERR); the next hop address for redirects is saved to ee_info (in
      network order).
      
      socket(2) is restricted to the group range specified in
      "/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
      that nobody (not even root) may create ping sockets.  Setting it to "100
      100" would grant permissions to the single group (to either make
      /sbin/ping g+s and owned by this group or to grant permissions to the
      "netadmins" group), "0 4294967295" would enable it for the world, "100
      4294967295" would enable it for the users, but not daemons.
      
      The existing code might be (in the unlikely case anyone needs it)
      extended rather easily to handle other similar pairs of ICMP messages
      (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
      etc.).
      
      Userspace ping util & patch for it:
      http://openwall.info/wiki/people/segoon/ping
      
      For Openwall GNU/*/Linux it was the last step on the road to the
      setuid-less distro.  A revision of this patch (for RHEL5/OpenVZ kernels)
      is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
      http://mirrors.kernel.org/openwall/Owl/current/iso/
      
      Initially this functionality was written by Pavel Kankovsky for
      Linux 2.4.32, but unfortunately it was never made public.
      
      All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
      the patch.
      
      PATCH v3:
          - switched to flowi4.
          - minor changes to be consistent with raw sockets code.
      
      PATCH v2:
          - changed ping_debug() to pr_debug().
          - removed CONFIG_IP_PING.
          - removed ping_seq_fops.owner field (unused for procfs).
          - switched to proc_net_fops_create().
          - switched to %pK in seq_printf().
      
      PATCH v1:
          - fixed checksumming bug.
          - CAP_NET_RAW may not create icmp sockets anymore.
      
      RFC v2:
          - minor cleanups.
          - introduced sysctl'able group range to restrict socket(2).
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c319b4d7
    • V
      bonding,llc: Fix structure sizeof incompatibility for some PDUs · a10e1466
      Vitalii Demianets 提交于
      With some combinations of arch/compiler (e.g. arm-linux-gcc) the sizeof
      operator on structure returns value greater than expected. In cases when the
      structure is used for mapping PDU fields it may lead to unexpected results
      (such as holes and alignment problems in skb data). __packed prevents this
      undesired behavior.
      Signed-off-by: NVitalii Demianets <vitas@nppfactor.kiev.ua>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a10e1466
  18. 13 5月, 2011 1 次提交