1. 05 9月, 2013 1 次提交
    • A
      net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4 · c08751c8
      Alexander Sverdlin 提交于
      net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4
      
      Initially the problem was observed with ipsec, but later it became clear that
      SCTP data chunk fragmentation algorithm has problems with MTU values which are
      not multiple of 4. Test program was used which just transmits 2000 bytes long
      packets to other host. tcpdump was used to observe re-fragmentation in IP layer
      after SCTP already fragmented data chunks.
      
      With MTU 1500:
      12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500)
          10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0]
      12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596)
          10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0]
      12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
          10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]
      
      With MTU 1499:
      13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492)
          10.151.38.153.39084 > 10.151.24.91.54321: sctp[|sctp]
      13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28)
          10.151.38.153 > 10.151.24.91: ip-proto-132
      13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
          10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0]
      13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
          10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]
      
      Here problem in data portion limit calculation leads to re-fragmentation in IP,
      which is sub-optimal. The problem is max_data initial value, which doesn't take
      into account the fact, that data chunk must be padded to 4-bytes boundary.
      It's enough to correct max_data, because all later adjustments are correctly
      aligned to 4-bytes boundary.
      
      After the fix is applied, everything is fragmented correctly for uneven MTUs:
      15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496)
          10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0]
      15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
          10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0]
      15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
          10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]
      
      The bug was there for years already, but
       - is a performance issue, the packets are still transmitted
       - doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438)
      Signed-off-by: NAlexander Sverdlin <alexander.sverdlin@nsn.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c08751c8
  2. 10 8月, 2013 1 次提交
  3. 25 7月, 2013 1 次提交
  4. 02 7月, 2013 1 次提交
    • D
      net: sctp: rework debugging framework to use pr_debug and friends · bb33381d
      Daniel Borkmann 提交于
      We should get rid of all own SCTP debug printk macros and use the ones
      that the kernel offers anyway instead. This makes the code more readable
      and conform to the kernel code, and offers all the features of dynamic
      debbuging that pr_debug() et al has, such as only turning on/off portions
      of debug messages at runtime through debugfs. The runtime cost of having
      CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
      is negligible [1]. If kernel debugging is completly turned off, then these
      statements will also compile into "empty" functions.
      
      While we're at it, we also need to change the Kconfig option as it /now/
      only refers to the ifdef'ed code portions in outqueue.c that enable further
      debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
      was enabled with this Kconfig option and has now been removed, we
      transform those code parts into WARNs resp. where appropriate BUG_ONs so
      that those bugs can be more easily detected as probably not many people
      have SCTP debugging permanently turned on.
      
      To turn on all SCTP debugging, the following steps are needed:
      
       # mount -t debugfs none /sys/kernel/debug
       # echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control
      
      This can be done more fine-grained on a per file, per line basis and others
      as described in [2].
      
       [1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
       [2] Documentation/dynamic-debug-howto.txt
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb33381d
  5. 18 6月, 2013 1 次提交
  6. 29 11月, 2012 2 次提交
    • T
      sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall · 6e51fe75
      Tommi Rantala 提交于
      Consider the following program, that sets the second argument to the
      sendto() syscall incorrectly:
      
       #include <string.h>
       #include <arpa/inet.h>
       #include <sys/socket.h>
      
       int main(void)
       {
               int fd;
               struct sockaddr_in sa;
      
               fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
               if (fd < 0)
                       return 1;
      
               memset(&sa, 0, sizeof(sa));
               sa.sin_family = AF_INET;
               sa.sin_addr.s_addr = inet_addr("127.0.0.1");
               sa.sin_port = htons(11111);
      
               sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
      
               return 0;
       }
      
      We get -ENOMEM:
      
       $ strace -e sendto ./demo
       sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory)
      
      Propagate the error code from sctp_user_addto_chunk(), so that we will
      tell user space what actually went wrong:
      
       $ strace -e sendto ./demo
       sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address)
      
      Noticed while running Trinity (the syscall fuzzer).
      Signed-off-by: NTommi Rantala <tt.rantala@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e51fe75
    • T
      sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails · be364c8c
      Tommi Rantala 提交于
      Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
      reproducible e.g. with the sendto() syscall by passing invalid
      user space pointer in the second argument:
      
       #include <string.h>
       #include <arpa/inet.h>
       #include <sys/socket.h>
      
       int main(void)
       {
               int fd;
               struct sockaddr_in sa;
      
               fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
               if (fd < 0)
                       return 1;
      
               memset(&sa, 0, sizeof(sa));
               sa.sin_family = AF_INET;
               sa.sin_addr.s_addr = inet_addr("127.0.0.1");
               sa.sin_port = htons(11111);
      
               sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
      
               return 0;
       }
      
      As far as I can tell, the leak has been around since ~2003.
      Signed-off-by: NTommi Rantala <tt.rantala@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be364c8c
  7. 15 8月, 2012 1 次提交
  8. 27 8月, 2010 1 次提交
  9. 01 5月, 2010 1 次提交
  10. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  11. 24 11月, 2009 1 次提交
    • W
      sctp: implement the sender side for SACK-IMMEDIATELY extension · b93d6471
      Wei Yongjun 提交于
      This patch implement the sender side for SACK-IMMEDIATELY
      extension.
      
        Section 4.1.  Sender Side Considerations
      
        Whenever the sender of a DATA chunk can benefit from the
        corresponding SACK chunk being sent back without delay, the sender
        MAY set the I-bit in the DATA chunk header.
      
        Reasons for setting the I-bit include
      
        o  The sender is in the SHUTDOWN-PENDING state.
      
        o  The application requests to set the I-bit of the last DATA chunk
           of a user message when providing the user message to the SCTP
           implementation.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      b93d6471
  12. 05 9月, 2009 4 次提交
    • V
      sctp: Don't do NAGLE delay on large writes that were fragmented small · cb95ea32
      Vlad Yasevich 提交于
      SCTP will delay the last part of a large write due to NAGLE, if that
      part is smaller then MTU.  Since we are doing large writes, we might
      as well send the last portion now instead of waiting untill the next
      large write happens.  The small portion will be sent as is regardless,
      so it's better to not delay it.
      
      This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
      and Doug Graham <dgraham@nortel.com>.  Many thanks go out to them.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      cb95ea32
    • V
      sctp: Send user messages to the lower layer as one · 9c5c62be
      Vlad Yasevich 提交于
      Currenlty, sctp breaks up user messages into fragments and
      sends each fragment to the lower layer by itself.  This means
      that for each fragment we go all the way down the stack
      and back up.  This also discourages bundling of multiple
      fragments when they can fit into a sigle packet (ex: due
      to user setting a low fragmentation threashold).
      
      We introduce a new command SCTP_CMD_SND_MSG and hand the
      whole message down state machine.  The state machine and
      the side-effect parser will cork the queue, add all chunks
      from the message to the queue, and then un-cork the queue
      thus causing the chunks to get transmitted.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9c5c62be
    • V
      sctp: Try to encourage SACK bundling with DATA. · 5d7ff261
      Vlad Yasevich 提交于
      If the association has a SACK timer pending and now DATA queued
      to be send, we'll try to bundle the SACK with the next application send.
      As such, try encourage bundling by accounting for SACK in the size
      of the first chunk fragment.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      5d7ff261
    • V
      sctp: Fix data segmentation with small frag_size · 3e62abf9
      Vlad Yasevich 提交于
      Since an application may specify the maximum SCTP fragment size
      that all data should be fragmented to, we need to fix how
      we do segmentation.   Right now, if a user specifies a small
      fragment size, the segment size can go negative in the presence
      of AUTH or COOKIE_ECHO bundling.
      
      What we need to do is track the largest possbile DATA chunk that
      can fit into the mtu.  Then if the fragment size specified is
      bigger then this maximum length, we'll shrink it down.  Otherwise,
      we just use the smaller segment size without changing it further.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      3e62abf9
  13. 10 4月, 2008 1 次提交
  14. 24 3月, 2008 1 次提交
  15. 06 3月, 2008 1 次提交
  16. 05 2月, 2008 1 次提交
  17. 11 10月, 2007 1 次提交
    • V
      [SCTP]: Enable the sending of the AUTH chunk. · 4cd57c80
      Vlad Yasevich 提交于
      SCTP-AUTH, Section 6.2:
      
         Endpoints MUST send all requested chunks authenticated where this has
         been requested by the peer.  The other chunks MAY be sent
         authenticated or not.  If endpoint pair shared keys are used, one of
         them MUST be selected for authentication.
      
         To send chunks in an authenticated way, the sender MUST include these
         chunks after an AUTH chunk.  This means that a sender MUST bundle
         chunks in order to authenticate them.
      
         If the endpoint has no endpoint pair shared key for the peer, it MUST
         use Shared Key Identifier 0 with an empty endpoint pair shared key.
         If there are multiple endpoint shared keys the sender selects one and
         uses the corresponding Shared Key Identifier
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cd57c80
  18. 09 5月, 2007 1 次提交
  19. 09 10月, 2005 1 次提交
  20. 12 7月, 2005 1 次提交
  21. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4