1. 06 1月, 2016 2 次提交
  2. 31 12月, 2015 1 次提交
    • X
      sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close · 068d8bd3
      Xin Long 提交于
      In sctp_close, sctp_make_abort_user may return NULL because of memory
      allocation failure. If this happens, it will bypass any state change
      and never free the assoc. The assoc has no chance to be freed and it
      will be kept in memory with the state it had even after the socket is
      closed by sctp_close().
      
      So if sctp_make_abort_user fails to allocate memory, we should abort
      the asoc via sctp_primitive_ABORT as well. Just like the annotation in
      sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
      "Even if we can't send the ABORT due to low memory delete the TCB.
      This is a departure from our typical NOMEM handling".
      
      But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
      dereference the chunk pointer, and system crash. So we should add
      SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
      places where it adds SCTP_CMD_REPLY cmd.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      068d8bd3
  3. 28 12月, 2015 2 次提交
  4. 16 12月, 2015 3 次提交
  5. 12 12月, 2015 1 次提交
  6. 08 12月, 2015 1 次提交
  7. 07 12月, 2015 3 次提交
  8. 06 12月, 2015 3 次提交
  9. 04 12月, 2015 2 次提交
  10. 03 12月, 2015 1 次提交
  11. 02 12月, 2015 2 次提交
    • E
      net: fix sock_wake_async() rcu protection · ceb5d58b
      Eric Dumazet 提交于
      Dmitry provided a syzkaller (http://github.com/google/syzkaller)
      triggering a fault in sock_wake_async() when async IO is requested.
      
      Said program stressed af_unix sockets, but the issue is generic
      and should be addressed in core networking stack.
      
      The problem is that by the time sock_wake_async() is called,
      we should not access the @flags field of 'struct socket',
      as the inode containing this socket might be freed without
      further notice, and without RCU grace period.
      
      We already maintain an RCU protected structure, "struct socket_wq"
      so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
      is the safe route.
      
      It also reduces number of cache lines needing dirtying, so might
      provide a performance improvement anyway.
      
      In followup patches, we might move remaining flags (SOCK_NOSPACE,
      SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
      being mostly read and let it being shared between cpus.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ceb5d58b
    • E
      net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA · 9cd3e072
      Eric Dumazet 提交于
      This patch is a cleanup to make following patch easier to
      review.
      
      Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
      from (struct socket)->flags to a (struct socket_wq)->flags
      to benefit from RCU protection in sock_wake_async()
      
      To ease backports, we rename both constants.
      
      Two new helpers, sk_set_bit(int nr, struct sock *sk)
      and sk_clear_bit(int net, struct sock *sk) are added so that
      following patch can change their implementation.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cd3e072
  12. 01 12月, 2015 1 次提交
  13. 16 11月, 2015 1 次提交
  14. 10 11月, 2015 1 次提交
    • A
      remove abs64() · 79211c8e
      Andrew Morton 提交于
      Switch everything to the new and more capable implementation of abs().
      Mainly to give the new abs() a bit of a workout.
      
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79211c8e
  15. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  16. 05 10月, 2015 1 次提交
  17. 29 9月, 2015 4 次提交
    • V
      net: Drop unlikely before IS_ERR(_OR_NULL) · b5ffe634
      Viresh Kumar 提交于
      IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
      is no need to do that again from its callers. Drop it.
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      b5ffe634
    • D
      net: sctp: Don't use 64 kilobyte lookup table for four elements · 2103d6b8
      Denys Vlasenko 提交于
      Seemingly innocuous sctp_trans_state_to_prio_map[] array
      is way bigger than it looks, since
      "[SCTP_UNKNOWN] = 2" expands into "[0xffff] = 2" !
      
      This patch replaces it with switch() statement.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Neil Horman <nhorman@tuxdriver.com>
      CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      CC: linux-sctp@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2103d6b8
    • K
      sctp: Prevent soft lockup when sctp_accept() is called during a timeout event · 635682a1
      Karl Heiss 提交于
      A case can occur when sctp_accept() is called by the user during
      a heartbeat timeout event after the 4-way handshake.  Since
      sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
      bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
      the listening socket but released with the new association socket.
      The result is a deadlock on any future attempts to take the listening
      socket lock.
      
      Note that this race can occur with other SCTP timeouts that take
      the bh_lock_sock() in the event sctp_accept() is called.
      
       BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
       ...
       RIP: 0010:[<ffffffff8152d48e>]  [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
       RSP: 0018:ffff880028323b20  EFLAGS: 00000206
       RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
       RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
       R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
       R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
       FS:  0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
       CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
       Stack:
       ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
       <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
       <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
       Call Trace:
       <IRQ>
       [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
       [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
       [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
       [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
       [<ffffffff81497255>] ? ip_rcv+0x275/0x350
       [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
       ...
      
      With lockdep debugging:
      
       =====================================
       [ BUG: bad unlock balance detected! ]
       -------------------------------------
       CslRx/12087 is trying to release lock (slock-AF_INET) at:
       [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
       but there are no more locks to release!
      
       other info that might help us debug this:
       2 locks held by CslRx/12087:
       #0:  (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
       #1:  (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
      
      Ensure the socket taken is also the same one that is released by
      saving a copy of the socket before entering the timeout event
      critical section.
      Signed-off-by: NKarl Heiss <kheiss@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      635682a1
    • K
      sctp: Whitespace fix · f05940e6
      Karl Heiss 提交于
      Fix indentation in sctp_generate_heartbeat_event.
      Signed-off-by: NKarl Heiss <kheiss@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f05940e6
  18. 12 9月, 2015 1 次提交
    • M
      sctp: fix race on protocol/netns initialization · 8e2d61e0
      Marcelo Ricardo Leitner 提交于
      Consider sctp module is unloaded and is being requested because an user
      is creating a sctp socket.
      
      During initialization, sctp will add the new protocol type and then
      initialize pernet subsys:
      
              status = sctp_v4_protosw_init();
              if (status)
                      goto err_protosw_init;
      
              status = sctp_v6_protosw_init();
              if (status)
                      goto err_v6_protosw_init;
      
              status = register_pernet_subsys(&sctp_net_ops);
      
      The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
      is possible for userspace to create SCTP sockets like if the module is
      already fully loaded. If that happens, one of the possible effects is
      that we will have readers for net->sctp.local_addr_list list earlier
      than expected and sctp_net_init() does not take precautions while
      dealing with that list, leading to a potential panic but not limited to
      that, as sctp_sock_init() will copy a bunch of blank/partially
      initialized values from net->sctp.
      
      The race happens like this:
      
           CPU 0                           |  CPU 1
        socket()                           |
         __sock_create                     | socket()
          inet_create                      |  __sock_create
           list_for_each_entry_rcu(        |
              answer, &inetsw[sock->type], |
              list) {                      |   inet_create
            /* no hits */                  |
           if (unlikely(err)) {            |
            ...                            |
            request_module()               |
            /* socket creation is blocked  |
             * the module is fully loaded  |
             */                            |
             sctp_init                     |
              sctp_v4_protosw_init         |
               inet_register_protosw       |
                list_add_rcu(&p->list,     |
                             last_perm);   |
                                           |  list_for_each_entry_rcu(
                                           |     answer, &inetsw[sock->type],
              sctp_v6_protosw_init         |     list) {
                                           |     /* hit, so assumes protocol
                                           |      * is already loaded
                                           |      */
                                           |  /* socket creation continues
                                           |   * before netns is initialized
                                           |   */
              register_pernet_subsys       |
      
      Simply inverting the initialization order between
      register_pernet_subsys() and sctp_v4_protosw_init() is not possible
      because register_pernet_subsys() will create a control sctp socket, so
      the protocol must be already visible by then. Deferring the socket
      creation to a work-queue is not good specially because we loose the
      ability to handle its errors.
      
      So, as suggested by Vlad, the fix is to split netns initialization in
      two moments: defaults and control socket, so that the defaults are
      already loaded by when we register the protocol, while control socket
      initialization is kept at the same moment it is today.
      
      Fixes: 4db67e80 ("sctp: Make the address lists per network namespace")
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e2d61e0
  19. 04 9月, 2015 2 次提交
  20. 29 8月, 2015 2 次提交
  21. 28 8月, 2015 2 次提交
  22. 27 7月, 2015 1 次提交
    • D
      net: sctp: stop spamming klog with rfc6458, 5.3.2. deprecation warnings · 81296fc6
      Daniel Borkmann 提交于
      Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from
      RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the
      (as per RFC deprecated) SCTP_SNDRCV via commit bbbea41d ("net:
      sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support"), see [1].
      
      Imho, it was not a good idea, and we should just revert that message
      for a couple of reasons:
      
        1) It's uapi and therefore set in stone forever.
      
        2) To be able to run on older and newer kernels, an SCTP application
           would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/
           SCTP_RCVINFO support, so that on older kernels, it can make use
           of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO.
           In my (limited) experience, a lot of SCTP appliances are migrating
           to newer kernels only ve(ee)ry slowly.
      
        3) Some people don't have the chance to change their applications,
           f.e. due to proprietary legacy stuff. So, they'll hit this warning
           in fast path and are stuck with older kernels.
      
      But i.e. due to point 1) I really fail to see the benefit of a warning.
      So just revert that for now, the issue was reported up Jamal.
      
        [1] http://thread.gmane.org/gmane.linux.network/321960/Reported-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Michael Tuexen <tuexen@fh-muenster.de>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81296fc6
  23. 21 7月, 2015 2 次提交
    • M
      sctp: fix cut and paste issue in comment · b52effd2
      Marcelo Ricardo Leitner 提交于
      Cookie ACK is always received by the association initiator, so fix the
      comment to avoid confusion.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b52effd2
    • M
      sctp: fix src address selection if using secondary addresses · 0ca50d12
      Marcelo Ricardo Leitner 提交于
      In short, sctp is likely to incorrectly choose src address if socket is
      bound to secondary addresses. This patch fixes it by adding a new check
      that checks if such src address belongs to the interface that routing
      identified as output.
      
      This is enough to avoid rp_filter drops on remote peer.
      
      Details:
      
      Currently, sctp will do a routing attempt without specifying the src
      address and compare the returned value (preferred source) with the
      addresses that the socket is bound to. When using secondary addresses,
      this will not match.
      
      Then it will try specifying each of the addresses that the socket is
      bound to and re-routing, checking if that address is valid as src for
      that dst. Thing is, this check alone is weak:
      
      # ip r l
      192.168.100.0/24 dev eth1  proto kernel  scope link  src 192.168.100.149
      192.168.122.0/24 dev eth0  proto kernel  scope link  src 192.168.122.147
      
      # ip a l
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
          link/ether 52:54:00:15:18:6a brd ff:ff:ff:ff:ff:ff
          inet 192.168.122.147/24 brd 192.168.122.255 scope global dynamic eth0
             valid_lft 2160sec preferred_lft 2160sec
          inet 192.168.122.148/24 scope global secondary eth0
             valid_lft forever preferred_lft forever
          inet6 fe80::5054:ff:fe15:186a/64 scope link
             valid_lft forever preferred_lft forever
      3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
          link/ether 52:54:00:b3:91:46 brd ff:ff:ff:ff:ff:ff
          inet 192.168.100.149/24 brd 192.168.100.255 scope global dynamic eth1
             valid_lft 2162sec preferred_lft 2162sec
          inet 192.168.100.148/24 scope global secondary eth1
             valid_lft forever preferred_lft forever
          inet6 fe80::5054:ff:feb3:9146/64 scope link
             valid_lft forever preferred_lft forever
      4: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
          link/ether 52:54:00:05:47:ee brd ff:ff:ff:ff:ff:ff
          inet6 fe80::5054:ff:fe05:47ee/64 scope link
             valid_lft forever preferred_lft forever
      
      # ip r g 192.168.100.193 from 192.168.122.148
      192.168.100.193 from 192.168.122.148 dev eth1
          cache
      
      Even if you specify an interface:
      
      # ip r g 192.168.100.193 from 192.168.122.148 oif eth1
      192.168.100.193 from 192.168.122.148 dev eth1
          cache
      
      Although this would be valid, peers using rp_filter will drop such
      packets as their src doesn't match the routes for that interface.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ca50d12