1. 08 12月, 2012 1 次提交
  2. 04 12月, 2012 1 次提交
    • M
      sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call · 196d6759
      Michele Baldessari 提交于
      The current SCTP stack is lacking a mechanism to have per association
      statistics. This is an implementation modeled after OpenSolaris'
      SCTP_GET_ASSOC_STATS.
      
      Userspace part will follow on lksctp if/when there is a general ACK on
      this.
      V4:
      - Move ipackets++ before q->immediate.func() for consistency reasons
      - Move sctp_max_rto() at the end of sctp_transport_update_rto() to avoid
        returning bogus RTO values
      - return asoc->rto_min when max_obs_rto value has not changed
      
      V3:
      - Increase ictrlchunks in sctp_assoc_bh_rcv() as well
      - Move ipackets++ to sctp_inq_push()
      - return 0 when no rto updates took place since the last call
      
      V2:
      - Implement partial retrieval of stat struct to cope for future expansion
      - Kill the rtxpackets counter as it cannot be precise anyway
      - Rename outseqtsns to outofseqtsns to make it clearer that these are out
        of sequence unexpected TSNs
      - Move asoc->ipackets++ under a lock to avoid potential miscounts
      - Fold asoc->opackets++ into the already existing asoc check
      - Kill unneeded (q->asoc) test when increasing rtxchunks
      - Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0)
      - Don't count SHUTDOWNs as SACKs
      - Move SCTP_GET_ASSOC_STATS to the private space API
      - Adjust the len check in sctp_getsockopt_assoc_stats() to allow for
        future struct growth
      - Move association statistics in their own struct
      - Update idupchunks when we send a SACK with dup TSNs
      - return min_rto in max_rto when RTO has not changed. Also return the
        transport when max_rto last changed.
      
      Signed-off: Michele Baldessari <michele@acksyn.org>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      196d6759
  3. 01 12月, 2012 2 次提交
    • T
      sctp: verify length provided in heartbeat information parameter · 06a31e2b
      Thomas Graf 提交于
      If the variable parameter length provided in the mandatory
      heartbeat information parameter exceeds the calculated payload
      length the packet has been corrupted. Reply with a parameter
      length protocol violation message.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06a31e2b
    • T
      sctp: fix CONFIG_SCTP_DBG_MSG=y null pointer dereference in sctp_v6_get_dst() · ee3f34e8
      Tommi Rantala 提交于
      Trinity (the syscall fuzzer) triggered the following BUG, reproducible
      only when the kernel is configured with CONFIG_SCTP_DBG_MSG=y.
      
      When CONFIG_SCTP_DBG_MSG is not set, the null pointer is never
      dereferenced.
      
      ---[ end trace a4de0bfcb38a3642 ]---
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000100
      IP: [<ffffffff8136796e>] ip6_string+0x1e/0xa0
      PGD 4eead067 PUD 4e472067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in:
      CPU 3
      Pid: 21324, comm: trinity-child11 Tainted: G        W    3.7.0-rc7+ #61 ASUSTeK Computer INC. EB1012/EB1012
      RIP: 0010:[<ffffffff8136796e>]  [<ffffffff8136796e>] ip6_string+0x1e/0xa0
      RSP: 0018:ffff88004e4637a0  EFLAGS: 00010046
      RAX: ffff88004e4637da RBX: ffff88004e4637da RCX: 0000000000000000
      RDX: ffffffff8246e92a RSI: 0000000000000100 RDI: ffff88004e4637da
      RBP: ffff88004e4637a8 R08: 000000000000ffff R09: 000000000000ffff
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8289d600
      R13: ffffffff8289d230 R14: ffffffff8246e928 R15: ffffffff8289d600
      FS:  00007fed95153700(0000) GS:ffff88005fd80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000100 CR3: 000000004eeac000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process trinity-child11 (pid: 21324, threadinfo ffff88004e462000, task ffff8800524b0000)
      Stack:
       ffff88004e4637da ffff88004e463828 ffffffff81368eee 000000004e4637d8
       ffffffff0000ffff ffff88000000ffff 0000000000000000 000000004e4637f8
       ffffffff826285d8 ffff88004e4637f8 0000000000000000 ffff8800524b06b0
      Call Trace:
       [<ffffffff81368eee>] ip6_addr_string.isra.11+0x3e/0xa0
       [<ffffffff81369183>] pointer.isra.12+0x233/0x2d0
       [<ffffffff810a413a>] ? vprintk_emit+0x1ba/0x450
       [<ffffffff8110953d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
       [<ffffffff81369757>] vsnprintf+0x187/0x5d0
       [<ffffffff81369c62>] vscnprintf+0x12/0x30
       [<ffffffff810a4028>] vprintk_emit+0xa8/0x450
       [<ffffffff81e5cb00>] printk+0x49/0x4b
       [<ffffffff81d17221>] sctp_v6_get_dst+0x731/0x780
       [<ffffffff81d16e15>] ? sctp_v6_get_dst+0x325/0x780
       [<ffffffff81d00a96>] sctp_transport_route+0x46/0x120
       [<ffffffff81cff0f1>] sctp_assoc_add_peer+0x161/0x350
       [<ffffffff81d0fd8d>] sctp_sendmsg+0x6cd/0xcb0
       [<ffffffff81b55bf0>] ? inet_create+0x670/0x670
       [<ffffffff81b55cfb>] inet_sendmsg+0x10b/0x220
       [<ffffffff81b55bf0>] ? inet_create+0x670/0x670
       [<ffffffff81a72a64>] ? sock_update_classid+0xa4/0x2b0
       [<ffffffff81a72ab0>] ? sock_update_classid+0xf0/0x2b0
       [<ffffffff81a6ac1c>] sock_sendmsg+0xdc/0xf0
       [<ffffffff8118e9e5>] ? might_fault+0x85/0x90
       [<ffffffff8118e99c>] ? might_fault+0x3c/0x90
       [<ffffffff81a6e12a>] sys_sendto+0xfa/0x130
       [<ffffffff810a9887>] ? do_setitimer+0x197/0x380
       [<ffffffff81e960d5>] ? sysret_check+0x22/0x5d
       [<ffffffff81e960a9>] system_call_fastpath+0x16/0x1b
      Code: 01 eb 89 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 f8 31 c9 48 89 e5 53 eb 12 0f 1f 40 00 48 83 c1 01 48 83 c0 04 48 83 f9 08 74 70 <0f> b6 3c 4e 89 fb 83 e7 0f c0 eb 04 41 89 d8 41 83 e0 0f 0f b6
      RIP  [<ffffffff8136796e>] ip6_string+0x1e/0xa0
       RSP <ffff88004e4637a0>
      CR2: 0000000000000100
      ---[ end trace a4de0bfcb38a3643 ]---
      Signed-off-by: NTommi Rantala <tt.rantala@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee3f34e8
  4. 29 11月, 2012 3 次提交
  5. 21 11月, 2012 1 次提交
    • N
      sctp: send abort chunk when max_retrans exceeded · de4594a5
      Neil Horman 提交于
      In the event that an association exceeds its max_retrans attempts, we should
      send an ABORT chunk indicating that we are closing the assocation as a result.
      Because of the nature of the error, its unlikely to be received, but its a nice
      clean way to close the association if it does make it through, and it will give
      anyone watching via tcpdump a clue as to what happened.
      
      Change notes:
      v2)
      	* Removed erroneous changes from sctp_make_violation_parmlen
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de4594a5
  6. 19 11月, 2012 1 次提交
  7. 18 11月, 2012 1 次提交
  8. 16 11月, 2012 1 次提交
  9. 04 11月, 2012 1 次提交
    • N
      sctp: Clean up type-punning in sctp_cmd_t union · b26ddd81
      Neil Horman 提交于
      Lots of points in the sctp_cmd_interpreter function treat the sctp_cmd_t arg as
      a void pointer, even though they are written as various other types.  Theres no
      need for this as doing so just leads to possible type-punning issues that could
      cause crashes, and if we remain type-consistent we can actually just remove the
      void * member of the union entirely.
      
      Change Notes:
      
      v2)
      	* Dropped chunk that modified SCTP_NULL to create a marker pattern
      	 should anyone try to use a SCTP_NULL() assigned sctp_arg_t, Assigning
      	 to .zero provides the same effect and should be faster, per Vlad Y.
      
      v3)
      	* Reverted part of V2, opting to use memset instead of .zero, so that
      	 the entire union is initalized thus avoiding the i164 speculative load
      	 problems previously encountered, per Dave M..  Also rewrote
      	 SCTP_[NO]FORCE so as to use common infrastructure a little more
      
      Signed-off-by: Neil Horman <nhorman@tuxdriver.com
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b26ddd81
  10. 01 11月, 2012 1 次提交
  11. 26 10月, 2012 1 次提交
    • N
      sctp: Make hmac algorithm selection for cookie generation dynamic · 3c68198e
      Neil Horman 提交于
      Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to
      generate cookie values when establishing new connections via two build time
      config options.  Theres no real reason to make this a static selection.  We can
      add a sysctl that allows for the dynamic selection of these algorithms at run
      time, with the default value determined by the corresponding crypto library
      availability.
      This comes in handy when, for example running a system in FIPS mode, where use
      of md5 is disallowed, but SHA1 is permitted.
      
      Note: This new sysctl has no corresponding socket option to select the cookie
      hmac algorithm.  I chose not to implement that intentionally, as RFC 6458
      contains no option for this value, and I opted not to pollute the socket option
      namespace.
      
      Change notes:
      v2)
      	* Updated subject to have the proper sctp prefix as per Dave M.
      	* Replaced deafult selection options with new options that allow
      	  developers to explicitly select available hmac algs at build time
      	  as per suggestion by Vlad Y.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c68198e
  12. 17 10月, 2012 1 次提交
  13. 05 10月, 2012 2 次提交
  14. 27 9月, 2012 1 次提交
  15. 05 9月, 2012 1 次提交
  16. 04 9月, 2012 1 次提交
    • T
      sctp: Don't charge for data in sndbuf again when transmitting packet · 4c3a5bda
      Thomas Graf 提交于
      SCTP charges wmem_alloc via sctp_set_owner_w() in sctp_sendmsg() and via
      skb_set_owner_w() in sctp_packet_transmit(). If a sender runs out of
      sndbuf it will sleep in sctp_wait_for_sndbuf() and expects to be waken up
      by __sctp_write_space().
      
      Buffer space charged via sctp_set_owner_w() is released in sctp_wfree()
      which calls __sctp_write_space() directly.
      
      Buffer space charged via skb_set_owner_w() is released via sock_wfree()
      which calls sk->sk_write_space() _if_ SOCK_USE_WRITE_QUEUE is not set.
      sctp_endpoint_init() sets SOCK_USE_WRITE_QUEUE on all sockets.
      
      Therefore if sctp_packet_transmit() manages to queue up more than sndbuf
      bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is
      interrupted by a signal.
      
      This could be fixed by clearing the SOCK_USE_WRITE_QUEUE flag but ...
      
      Charging for the data twice does not make sense in the first place, it
      leads to overcharging sndbuf by a factor 2. Therefore this patch only
      charges a single byte in wmem_alloc when transmitting an SCTP packet to
      ensure that the socket stays alive until the packet has been released.
      
      This means that control chunks are no longer accounted for in wmem_alloc
      which I believe is not a problem as skb->truesize will typically lead
      to overcharging anyway and thus compensates for any control overhead.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: Neil Horman <nhorman@tuxdriver.com>
      CC: David Miller <davem@davemloft.net>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c3a5bda
  17. 17 8月, 2012 2 次提交
  18. 15 8月, 2012 17 次提交
  19. 01 8月, 2012 1 次提交
    • M
      netvm: prevent a stream-specific deadlock · c76562b6
      Mel Gorman 提交于
      This patch series is based on top of "Swap-over-NBD without deadlocking
      v15" as it depends on the same reservation of PF_MEMALLOC reserves logic.
      
      When a user or administrator requires swap for their application, they
      create a swap partition and file, format it with mkswap and activate it
      with swapon.  In diskless systems this is not an option so if swap if
      required then swapping over the network is considered.  The two likely
      scenarios are when blade servers are used as part of a cluster where the
      form factor or maintenance costs do not allow the use of disks and thin
      clients.
      
      The Linux Terminal Server Project recommends the use of the Network Block
      Device (NBD) for swap but this is not always an option.  There is no
      guarantee that the network attached storage (NAS) device is running Linux
      or supports NBD.  However, it is likely that it supports NFS so there are
      users that want support for swapping over NFS despite any performance
      concern.  Some distributions currently carry patches that support swapping
      over NFS but it would be preferable to support it in the mainline kernel.
      
      Patch 1 avoids a stream-specific deadlock that potentially affects TCP.
      
      Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
      	reserves.
      
      Patch 3 adds three helpers for filesystems to handle swap cache pages.
      	For example, page_file_mapping() returns page->mapping for
      	file-backed pages and the address_space of the underlying
      	swap file for swap cache pages.
      
      Patch 4 adds two address_space_operations to allow a filesystem
      	to pin all metadata relevant to a swapfile in memory. Upon
      	successful activation, the swapfile is marked SWP_FILE and
      	the address space operation ->direct_IO is used for writing
      	and ->readpage for reading in swap pages.
      
      Patch 5 notes that patch 3 is bolting
      	filesystem-specific-swapfile-support onto the side and that
      	the default handlers have different information to what
      	is available to the filesystem. This patch refactors the
      	code so that there are generic handlers for each of the new
      	address_space operations.
      
      Patch 6 adds an API to allow a vector of kernel addresses to be
      	translated to struct pages and pinned for IO.
      
      Patch 7 adds support for using highmem pages for swap by kmapping
      	the pages before calling the direct_IO handler.
      
      Patch 8 updates NFS to use the helpers from patch 3 where necessary.
      
      Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.
      
      Patch 10 implements the new swapfile-related address_space operations
      	for NFS and teaches the direct IO handler how to manage
      	kernel addresses.
      
      Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
      	where appropriate.
      
      Patch 12 fixes a NULL pointer dereference that occurs when using
      	swap-over-NFS.
      
      With the patches applied, it is possible to mount a swapfile that is on an
      NFS filesystem.  Swap performance is not great with a swap stress test
      taking roughly twice as long to complete than if the swap device was
      backed by NBD.
      
      This patch: netvm: prevent a stream-specific deadlock
      
      It could happen that all !SOCK_MEMALLOC sockets have buffered so much data
      that we're over the global rmem limit.  This will prevent SOCK_MEMALLOC
      buffers from receiving data, which will prevent userspace from running,
      which is needed to reduce the buffered data.
      
      Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.  Once
      this change it applied, it is important that sockets that set
      SOCK_MEMALLOC do not clear the flag until the socket is being torn down.
      If this happens, a warning is generated and the tokens reclaimed to avoid
      accounting errors until the bug is fixed.
      
      [davem@davemloft.net: Warning about clearing SOCK_MEMALLOC]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c76562b6