1. 16 4月, 2012 1 次提交
  2. 05 4月, 2012 1 次提交
    • T
      sctp: Allow struct sctp_event_subscribe to grow without breaking binaries · acdd5985
      Thomas Graf 提交于
      getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns
      an error if the user provides less bytes than the size of struct
      sctp_event_subscribe.
      
      Struct sctp_event_subscribe needs to be extended by an u8 for every
      new event or notification type that is added.
      
      This obviously makes getsockopt fail for binaries that are compiled
      against an older versions of <net/sctp/user.h> which do not contain
      all event types.
      
      This patch changes getsockopt behaviour to no longer return an error
      if not enough bytes are being provided by the user. Instead, it
      returns as much of sctp_event_subscribe as fits into the provided buffer.
      
      This leads to the new behavior that users see what they have been aware
      of at compile time.
      
      The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acdd5985
  3. 09 3月, 2012 1 次提交
  4. 21 12月, 2011 1 次提交
    • T
      sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd · a76c0adf
      Thomas Graf 提交于
      When checking whether a DATA chunk fits into the estimated rwnd a
      full sizeof(struct sk_buff) is added to the needed chunk size. This
      quickly exhausts the available rwnd space and leads to packets being
      sent which are much below the PMTU limit. This can lead to much worse
      performance.
      
      The reason for this behaviour was to avoid putting too much memory
      pressure on the receiver. The concept is not completely irational
      because a Linux receiver does in fact clone an skb for each DATA chunk
      delivered. However, Linux also reserves half the available socket
      buffer space for data structures therefore usage of it is already
      accounted for.
      
      When proposing to change this the last time it was noted that this
      behaviour was introduced to solve a performance issue caused by rwnd
      overusage in combination with small DATA chunks.
      
      Trying to reproduce this I found that with the sk_buff overhead removed,
      the performance would improve significantly unless socket buffer limits
      are increased.
      
      The following numbers have been gathered using a patched iperf
      supporting SCTP over a live 1 Gbit ethernet network. The -l option
      was used to limit DATA chunk sizes. The numbers listed are based on
      the average of 3 test runs each. Default values have been used for
      sk_(r|w)mem.
      
      Chunk
      Size    Unpatched     No Overhead
      -------------------------------------
         4    15.2 Kbit [!]   12.2 Mbit [!]
         8    35.8 Kbit [!]   26.0 Mbit [!]
        16    95.5 Kbit [!]   54.4 Mbit [!]
        32   106.7 Mbit      102.3 Mbit
        64   189.2 Mbit      188.3 Mbit
       128   331.2 Mbit      334.8 Mbit
       256   537.7 Mbit      536.0 Mbit
       512   766.9 Mbit      766.6 Mbit
      1024   810.1 Mbit      808.6 Mbit
      Signed-off-by: NThomas Graf <tgraf@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a76c0adf
  5. 20 12月, 2011 1 次提交
  6. 12 12月, 2011 1 次提交
  7. 02 12月, 2011 1 次提交
  8. 30 11月, 2011 1 次提交
    • X
      sctp: better integer overflow check in sctp_auth_create_key() · c89304b8
      Xi Wang 提交于
      The check from commit 30c2235c is incomplete and cannot prevent
      cases like key_len = 0x80000000 (INT_MAX + 1).  In that case, the
      left-hand side of the check (INT_MAX - key_len), which is unsigned,
      becomes 0xffffffff (UINT_MAX) and bypasses the check.
      
      However this shouldn't be a security issue.  The function is called
      from the following two code paths:
      
       1) setsockopt()
      
       2) sctp_auth_asoc_set_secret()
      
      In case (1), sca_keylength is never going to exceed 65535 since it's
      bounded by a u16 from the user API.  As such, the key length will
      never overflow.
      
      In case (2), sca_keylength is computed based on the user key (1 short)
      and 2 * key_vector (3 shorts) for a total of 7 * USHRT_MAX, which still
      will not overflow.
      
      In other words, this overflow check is not really necessary.  Just
      make it more correct.
      Signed-off-by: NXi Wang <xi.wang@gmail.com>
      Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c89304b8
  9. 23 11月, 2011 1 次提交
  10. 09 11月, 2011 2 次提交
  11. 01 11月, 2011 1 次提交
  12. 27 10月, 2011 1 次提交
  13. 14 10月, 2011 1 次提交
    • E
      net: more accurate skb truesize · 87fb4b7b
      Eric Dumazet 提交于
      skb truesize currently accounts for sk_buff struct and part of skb head.
      kmalloc() roundings are also ignored.
      
      Considering that skb_shared_info is larger than sk_buff, its time to
      take it into account for better memory accounting.
      
      This patch introduces SKB_TRUESIZE(X) macro to centralize various
      assumptions into a single place.
      
      At skb alloc phase, we put skb_shared_info struct at the exact end of
      skb head, to allow a better use of memory (lowering number of
      reallocations), since kmalloc() gives us power-of-two memory blocks.
      
      Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
      aligned to cache lines, as before.
      
      Note: This patch might trigger performance regressions because of
      misconfigured protocol stacks, hitting per socket or global memory
      limits that were previously not reached. But its a necessary step for a
      more accurate memory accounting.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Andi Kleen <ak@linux.intel.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87fb4b7b
  14. 17 9月, 2011 1 次提交
    • M
      sctp: deal with multiple COOKIE_ECHO chunks · d5ccd496
      Max Matveev 提交于
      Attempt to reduce the number of IP packets emitted in response to single
      SCTP packet (2e3216cd) introduced a complication - if a packet contains
      two COOKIE_ECHO chunks and nothing else then SCTP state machine corks the
      socket while processing first COOKIE_ECHO and then loses the association
      and forgets to uncork the socket. To deal with the issue add new SCTP
      command which can be used to set association explictly. Use this new
      command when processing second COOKIE_ECHO chunk to restore the context
      for SCTP state machine.
      Signed-off-by: NMax Matveev <makc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5ccd496
  15. 25 8月, 2011 2 次提交
  16. 15 7月, 2011 1 次提交
  17. 09 7月, 2011 1 次提交
  18. 08 7月, 2011 1 次提交
    • T
      sctp: Enforce retransmission limit during shutdown · f8d96052
      Thomas Graf 提交于
      When initiating a graceful shutdown while having data chunks
      on the retransmission queue with a peer which is in zero
      window mode the shutdown is never completed because the
      retransmission error count is reset periodically by the
      following two rules:
      
       - Do not timeout association while doing zero window probe.
       - Reset overall error count when a heartbeat request has
         been acknowledged.
      
      The graceful shutdown will wait for all outstanding TSN to
      be acknowledged before sending the SHUTDOWN request. This
      never happens due to the peer's zero window not acknowledging
      the continuously retransmitted data chunks. Although the
      error counter is incremented for each failed retransmission,
      the receiving of the SACK announcing the zero window clears
      the error count again immediately. Also heartbeat requests
      continue to be sent periodically. The peer acknowledges these
      requests causing the error counter to be reset as well.
      
      This patch changes behaviour to only reset the overall error
      counter for the above rules while not in shutdown. After
      reaching the maximum number of retransmission attempts, the
      T5 shutdown guard timer is scheduled to give the receiver
      some additional time to recover. The timer is stopped as soon
      as the receiver acknowledges any data.
      
      The issue can be easily reproduced by establishing a sctp
      association over the loopback device, constantly queueing
      data at the sender while not reading any at the receiver.
      Wait for the window to reach zero, then initiate a shutdown
      by killing both processes simultaneously. The association
      will never be freed and the chunks on the retransmission
      queue will be retransmitted indefinitely.
      Signed-off-by: NThomas Graf <tgraf@infradead.org>
      Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8d96052
  19. 07 7月, 2011 2 次提交
  20. 02 7月, 2011 1 次提交
  21. 17 6月, 2011 1 次提交
  22. 12 6月, 2011 1 次提交
  23. 07 6月, 2011 1 次提交
  24. 02 6月, 2011 5 次提交
  25. 01 6月, 2011 1 次提交
  26. 26 5月, 2011 1 次提交
  27. 24 5月, 2011 1 次提交
    • D
      net: convert %p usage to %pK · 71338aa7
      Dan Rosenberg 提交于
      The %pK format specifier is designed to hide exposed kernel pointers,
      specifically via /proc interfaces.  Exposing these pointers provides an
      easy target for kernel write vulnerabilities, since they reveal the
      locations of writable structures containing easily triggerable function
      pointers.  The behavior of %pK depends on the kptr_restrict sysctl.
      
      If kptr_restrict is set to 0, no deviation from the standard %p behavior
      occurs.  If kptr_restrict is set to 1, the default, if the current user
      (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
      (currently in the LSM tree), kernel pointers using %pK are printed as 0's.
       If kptr_restrict is set to 2, kernel pointers using %pK are printed as
      0's regardless of privileges.  Replacing with 0's was chosen over the
      default "(null)", which cannot be parsed by userland %p, which expects
      "(nil)".
      
      The supporting code for kptr_restrict and %pK are currently in the -mm
      tree.  This patch converts users of %p in net/ to %pK.  Cases of printing
      pointers to the syslog are not covered, since this would eliminate useful
      information for postmortem debugging and the reading of the syslog is
      already optionally protected by the dmesg_restrict sysctl.
      Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Thomas Graf <tgraf@infradead.org>
      Cc: Eugene Teo <eugeneteo@kernel.org>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71338aa7
  28. 21 5月, 2011 1 次提交
  29. 20 5月, 2011 1 次提交
    • J
      SCTP: fix race between sctp_bind_addr_free() and sctp_bind_addr_conflict() · c182f90b
      Jacek Luczak 提交于
      During the sctp_close() call, we do not use rcu primitives to
      destroy the address list attached to the endpoint.  At the same
      time, we do the removal of addresses from this list before
      attempting to remove the socket from the port hash
      
      As a result, it is possible for another process to find the socket
      in the port hash that is in the process of being closed.  It then
      proceeds to traverse the address list to find the conflict, only
      to have that address list suddenly disappear without rcu() critical
      section.
      
      Fix issue by closing address list removal inside RCU critical
      section.
      
      Race can result in a kernel crash with general protection fault or
      kernel NULL pointer dereference:
      
      kernel: general protection fault: 0000 [#1] SMP
      kernel: RIP: 0010:[<ffffffffa02f3dde>]  [<ffffffffa02f3dde>] sctp_bind_addr_conflict+0x64/0x82 [sctp]
      kernel: Call Trace:
      kernel:  [<ffffffffa02f415f>] ? sctp_get_port_local+0x17b/0x2a3 [sctp]
      kernel:  [<ffffffffa02f3d45>] ? sctp_bind_addr_match+0x33/0x68 [sctp]
      kernel:  [<ffffffffa02f4416>] ? sctp_do_bind+0xd3/0x141 [sctp]
      kernel:  [<ffffffffa02f5030>] ? sctp_bindx_add+0x4d/0x8e [sctp]
      kernel:  [<ffffffffa02f5183>] ? sctp_setsockopt_bindx+0x112/0x4a4 [sctp]
      kernel:  [<ffffffff81089e82>] ? generic_file_aio_write+0x7f/0x9b
      kernel:  [<ffffffffa02f763e>] ? sctp_setsockopt+0x14f/0xfee [sctp]
      kernel:  [<ffffffff810c11fb>] ? do_sync_write+0xab/0xeb
      kernel:  [<ffffffff810e82ab>] ? fsnotify+0x239/0x282
      kernel:  [<ffffffff810c2462>] ? alloc_file+0x18/0xb1
      kernel:  [<ffffffff8134a0b1>] ? compat_sys_setsockopt+0x1a5/0x1d9
      kernel:  [<ffffffff8134aaf1>] ? compat_sys_socketcall+0x143/0x1a4
      kernel:  [<ffffffff810467dc>] ? sysenter_dispatch+0x7/0x32
      Signed-off-by: NJacek Luczak <luczak.jacek@gmail.com>
      Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c182f90b
  30. 13 5月, 2011 2 次提交
  31. 11 5月, 2011 1 次提交
  32. 09 5月, 2011 1 次提交