1. 15 11月, 2021 2 次提交
    • W
      net/smc: Transfer remaining wait queue entries during fallback · 2153bd1e
      Wen Gu 提交于
      The SMC fallback is incomplete currently. There may be some
      wait queue entries remaining in smc socket->wq, which should
      be removed to clcsocket->wq during the fallback.
      
      For example, in nginx/wrk benchmark, this issue causes an
      all-zeros test result:
      
      server: nginx -g 'daemon off;'
      client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html
      
        Running 5s test @ http://11.200.15.93/index.html
           1 threads and 1 connections
           Thread Stats   Avg      Stdev     Max   ± Stdev
           	Latency     0.00us    0.00us   0.00us    -nan%
      	Req/Sec     0.00      0.00     0.00      -nan%
      	0 requests in 5.00s, 0.00B read
           Requests/sec:      0.00
           Transfer/sec:       0.00B
      
      The reason for this all-zeros result is that when wrk used SMC
      to replace TCP, it added an eppoll_entry into smc socket->wq
      and expected to be notified if epoll events like EPOLL_IN/
      EPOLL_OUT occurred on the smc socket.
      
      However, once a fallback occurred, wrk switches to use clcsocket.
      Now it is clcsocket->wq instead of smc socket->wq which will
      be woken up. The eppoll_entry remaining in smc socket->wq does
      not work anymore and wrk stops the test.
      
      This patch fixes this issue by removing remaining wait queue
      entries from smc socket->wq to clcsocket->wq during the fallback.
      
      Link: https://www.spinics.net/lists/netdev/msg779769.htmlSigned-off-by: NWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2153bd1e
    • T
      tipc: use consistent GFP flags · 86c3a3e9
      Tadeusz Struk 提交于
      Some functions, like tipc_crypto_start use inconsisten GFP flags
      when allocating memory. The mentioned function use GFP_ATOMIC to
      to alloc a crypto instance, and then calls alloc_ordered_workqueue()
      which allocates memory with GFP_KERNEL. tipc_aead_init() function
      even uses GFP_KERNEL and GFP_ATOMIC interchangeably.
      No doc comment specifies what context a function is designed to
      work in, but the flags should at least be consistent within a function.
      
      Cc: Jon Maloy <jmaloy@redhat.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: tipc-discussion@lists.sourceforge.net
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NTadeusz Struk <tadeusz.struk@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86c3a3e9
  2. 14 11月, 2021 1 次提交
    • P
      net,lsm,selinux: revert the security_sctp_assoc_established() hook · 1aa3b220
      Paul Moore 提交于
      This patch reverts two prior patches, e7310c94
      ("security: implement sctp_assoc_established hook in selinux") and
      7c2ef024 ("security: add sctp_assoc_established hook"), which
      create the security_sctp_assoc_established() LSM hook and provide a
      SELinux implementation.  Unfortunately these two patches were merged
      without proper review (the Reviewed-by and Tested-by tags from
      Richard Haines were for previous revisions of these patches that
      were significantly different) and there are outstanding objections
      from the SELinux maintainers regarding these patches.
      
      Work is currently ongoing to correct the problems identified in the
      reverted patches, as well as others that have come up during review,
      but it is unclear at this point in time when that work will be ready
      for inclusion in the mainline kernel.  In the interest of not keeping
      objectionable code in the kernel for multiple weeks, and potentially
      a kernel release, we are reverting the two problematic patches.
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1aa3b220
  3. 13 11月, 2021 1 次提交
    • A
      tcp: Fix uninitialized access in skb frags array for Rx 0cp. · 70701b83
      Arjun Roy 提交于
      TCP Receive zerocopy iterates through the SKB queue via
      tcp_recv_skb(), acquiring a pointer to an SKB and an offset within
      that SKB to read from. From there, it iterates the SKB frags array to
      determine which offset to start remapping pages from.
      
      However, this is built on the assumption that the offset read so far
      within the SKB is smaller than the SKB length. If this assumption is
      violated, we can attempt to read an invalid frags array element, which
      would cause a fault.
      
      tcp_recv_skb() can cause such an SKB to be returned when the TCP FIN
      flag is set. Therefore, we must guard against this occurrence inside
      skb_advance_frag().
      
      One way that we can reproduce this error follows:
      1) In a receiver program, call getsockopt(TCP_ZEROCOPY_RECEIVE) with:
      char some_array[32 * 1024];
      struct tcp_zerocopy_receive zc = {
        .copybuf_address  = (__u64) &some_array[0],
        .copybuf_len = 32 * 1024,
      };
      
      2) In a sender program, after a TCP handshake, send the following
      sequence of packets:
        i) Seq = [X, X+4000]
        ii) Seq = [X+4000, X+5000]
        iii) Seq = [X+4000, X+5000], Flags = FIN | URG, urgptr=1000
      
      (This can happen without URG, if we have a signal pending, but URG is
      a convenient way to reproduce the behaviour).
      
      In this case, the following event sequence will occur on the receiver:
      
      tcp_zerocopy_receive():
      -> receive_fallback_to_copy() // copybuf_len >= inq
      -> tcp_recvmsg_locked() // reads 5000 bytes, then breaks due to URG
      -> tcp_recv_skb() // yields skb with skb->len == offset
      -> tcp_zerocopy_set_hint_for_skb()
      -> skb_advance_to_frag() // will returns a frags ptr. >= nr_frags
      -> find_next_mappable_frag() // will dereference this bad frags ptr.
      
      With this patch, skb_advance_to_frag() will no longer return an
      invalid frags pointer, and will return NULL instead, fixing the issue.
      Signed-off-by: NArjun Roy <arjunroy@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: 05255b82 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive")
      Link: https://lore.kernel.org/r/20211111235215.2605384-1-arjunroy.kdev@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      70701b83
  4. 11 11月, 2021 1 次提交
    • A
      net: fix premature exit from NAPI state polling in napi_disable() · 0315a075
      Alexander Lobakin 提交于
      Commit 719c5719 ("net: make napi_disable() symmetric with
      enable") accidentally introduced a bug sometimes leading to a kernel
      BUG when bringing an iface up/down under heavy traffic load.
      
      Prior to this commit, napi_disable() was polling n->state until
      none of (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC) is set and then
      always flip them. Now there's a possibility to get away with the
      NAPIF_STATE_SCHE unset as 'continue' drops us to the cmpxchg()
      call with an uninitialized variable, rather than straight to
      another round of the state check.
      
      Error path looks like:
      
      napi_disable():
      unsigned long val, new; /* new is uninitialized */
      
      do {
      	val = READ_ONCE(n->state); /* NAPIF_STATE_NPSVC and/or
      				      NAPIF_STATE_SCHED is set */
      	if (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) { /* true */
      		usleep_range(20, 200);
      		continue; /* go straight to the condition check */
      	}
      	new = val | <...>
      } while (cmpxchg(&n->state, val, new) != val); /* state == val, cmpxchg()
      						  writes garbage */
      
      napi_enable():
      do {
      	val = READ_ONCE(n->state);
      	BUG_ON(!test_bit(NAPI_STATE_SCHED, &val)); /* 50/50 boom */
      <...>
      
      while the typical BUG splat is like:
      
      [  172.652461] ------------[ cut here ]------------
      [  172.652462] kernel BUG at net/core/dev.c:6937!
      [  172.656914] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      [  172.661966] CPU: 36 PID: 2829 Comm: xdp_redirect_cp Tainted: G          I       5.15.0 #42
      [  172.670222] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
      [  172.680646] RIP: 0010:napi_enable+0x5a/0xd0
      [  172.684832] Code: 07 49 81 cc 00 01 00 00 4c 89 e2 48 89 d8 80 e6 fb f0 48 0f b1 55 10 48 39 c3 74 10 48 8b 5d 10 f6 c7 04 75 3d f6 c3 01 75 b4 <0f> 0b 5b 5d 41 5c c3 65 ff 05 b8 e5 61 53 48 c7 c6 c0 f3 34 ad 48
      [  172.703578] RSP: 0018:ffffa3c9497477a8 EFLAGS: 00010246
      [  172.708803] RAX: ffffa3c96615a014 RBX: 0000000000000000 RCX: ffff8a4b575301a0
      < snip >
      [  172.782403] Call Trace:
      [  172.784857]  <TASK>
      [  172.786963]  ice_up_complete+0x6f/0x210 [ice]
      [  172.791349]  ice_xdp+0x136/0x320 [ice]
      [  172.795108]  ? ice_change_mtu+0x180/0x180 [ice]
      [  172.799648]  dev_xdp_install+0x61/0xe0
      [  172.803401]  dev_xdp_attach+0x1e0/0x550
      [  172.807240]  dev_change_xdp_fd+0x1e6/0x220
      [  172.811338]  do_setlink+0xee8/0x1010
      [  172.814917]  rtnl_setlink+0xe5/0x170
      [  172.818499]  ? bpf_lsm_binder_set_context_mgr+0x10/0x10
      [  172.823732]  ? security_capable+0x36/0x50
      < snip >
      
      Fix this by replacing 'do { } while (cmpxchg())' with an "infinite"
      for-loop with an explicit break.
      
      From v1 [0]:
       - just use a for-loop to simplify both the fix and the existing
         code (Eric).
      
      [0] https://lore.kernel.org/netdev/20211110191126.1214-1-alexandr.lobakin@intel.com
      
      Fixes: 719c5719 ("net: make napi_disable() symmetric with enable")
      Suggested-by: Eric Dumazet <edumazet@google.com> # for-loop
      Signed-off-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
      Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211110195605.1304-1-alexandr.lobakin@intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      0315a075
  5. 10 11月, 2021 4 次提交
    • D
      net/smc: fix sk_refcnt underflow on linkdown and fallback · e5d5aadc
      Dust Li 提交于
      We got the following WARNING when running ab/nginx
      test with RDMA link flapping (up-down-up).
      The reason is when smc_sock fallback and at linkdown
      happens simultaneously, we may got the following situation:
      
      __smc_lgr_terminate()
       --> smc_conn_kill()
          --> smc_close_active_abort()
                 smc_sock->sk_state = SMC_CLOSED
                 sock_put(smc_sock)
      
      smc_sock was set to SMC_CLOSED and sock_put() been called
      when terminate the link group. But later application call
      close() on the socket, then we got:
      
      __smc_release():
          if (smc_sock->fallback)
              smc_sock->sk_state = SMC_CLOSED
              sock_put(smc_sock)
      
      Again we set the smc_sock to CLOSED through it's already
      in CLOSED state, and double put the refcnt, so the following
      warning happens:
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 5 PID: 860 at lib/refcount.c:28 refcount_warn_saturate+0x8d/0xf0
      Modules linked in:
      CPU: 5 PID: 860 Comm: nginx Not tainted 5.10.46+ #403
      Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
      RIP: 0010:refcount_warn_saturate+0x8d/0xf0
      Code: 05 5c 1e b5 01 01 e8 52 25 bc ff 0f 0b c3 80 3d 4f 1e b5 01 00 75 ad 48
      
      RSP: 0018:ffffc90000527e50 EFLAGS: 00010286
      RAX: 0000000000000026 RBX: ffff8881300df2c0 RCX: 0000000000000027
      RDX: 0000000000000000 RSI: ffff88813bd58040 RDI: ffff88813bd58048
      RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000001
      R10: ffff8881300df2c0 R11: ffffc90000527c78 R12: ffff8881300df340
      R13: ffff8881300df930 R14: ffff88810b3dad80 R15: ffff8881300df4f8
      FS:  00007f739de8fb80(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000a01b008 CR3: 0000000111b64003 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       smc_release+0x353/0x3f0
       __sock_release+0x3d/0xb0
       sock_close+0x11/0x20
       __fput+0x93/0x230
       task_work_run+0x65/0xa0
       exit_to_user_mode_prepare+0xf9/0x100
       syscall_exit_to_user_mode+0x27/0x190
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This patch adds check in __smc_release() to make
      sure we won't do an extra sock_put() and set the
      socket to CLOSED when its already in CLOSED state.
      
      Fixes: 51f1de79 (net/smc: replace sock_put worker by socket refcounting)
      Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
      Acked-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5d5aadc
    • E
      vsock: prevent unnecessary refcnt inc for nonblocking connect · c7cd82b9
      Eiichi Tsukata 提交于
      Currently vosck_connect() increments sock refcount for nonblocking
      socket each time it's called, which can lead to memory leak if
      it's called multiple times because connect timeout function decrements
      sock refcount only once.
      
      Fixes it by making vsock_connect() return -EALREADY immediately when
      sock state is already SS_CONNECTING.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NEiichi Tsukata <eiichi.tsukata@nutanix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7cd82b9
    • E
      net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_any · 6dc25401
      Eric Dumazet 提交于
      1) if q->tk_offset == TK_OFFS_MAX, then get_tcp_tstamp() calls
         ktime_mono_to_any() with out-of-bound value.
      
      2) if q->tk_offset is changed in taprio_parse_clockid(),
         taprio_get_time() might also call ktime_mono_to_any()
         with out-of-bound value as sysbot found:
      
      UBSAN: array-index-out-of-bounds in kernel/time/timekeeping.c:908:27
      index 3 is out of range for type 'ktime_t *[3]'
      CPU: 1 PID: 25668 Comm: kworker/u4:0 Not tainted 5.15.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
       __ubsan_handle_out_of_bounds.cold+0x62/0x6c lib/ubsan.c:291
       ktime_mono_to_any+0x1d4/0x1e0 kernel/time/timekeeping.c:908
       get_tcp_tstamp net/sched/sch_taprio.c:322 [inline]
       get_packet_txtime net/sched/sch_taprio.c:353 [inline]
       taprio_enqueue_one+0x5b0/0x1460 net/sched/sch_taprio.c:420
       taprio_enqueue+0x3b1/0x730 net/sched/sch_taprio.c:485
       dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3785
       __dev_xmit_skb net/core/dev.c:3869 [inline]
       __dev_queue_xmit+0x1f6e/0x3630 net/core/dev.c:4194
       batadv_send_skb_packet+0x4a9/0x5f0 net/batman-adv/send.c:108
       batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:393 [inline]
       batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:421 [inline]
       batadv_iv_send_outstanding_bat_ogm_packet+0x6d7/0x8e0 net/batman-adv/bat_iv_ogm.c:1701
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
       kthread+0x405/0x4f0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
      
      Fixes: 7ede7b03 ("taprio: make clock reference conversions easier")
      Fixes: 54002066 ("taprio: Adjust timestamps for TCP packets")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Vedang Patel <vedang.patel@intel.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Link: https://lore.kernel.org/r/20211108180815.1822479-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      6dc25401
    • K
      sections: move and rename core_kernel_data() to is_kernel_core_data() · a20deb3a
      Kefeng Wang 提交于
      Move core_kernel_data() into sections.h and rename it to
      is_kernel_core_data(), also make it return bool value, then update all the
      callers.
      
      Link: https://lkml.kernel.org/r/20210930071143.63410-4-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: NSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a20deb3a
  6. 09 11月, 2021 5 次提交
    • J
      bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg · b2c46181
      Jussi Maki 提交于
      The current conversion of skb->data_end reads like this:
      
        ; data_end = (void*)(long)skb->data_end;
         559: (79) r1 = *(u64 *)(r2 +200)   ; r1  = skb->data
         560: (61) r11 = *(u32 *)(r2 +112)  ; r11 = skb->len
         561: (0f) r1 += r11
         562: (61) r11 = *(u32 *)(r2 +116)
         563: (1f) r1 -= r11
      
      But similar to the case in 84f44df6 ("bpf: sock_ops sk access may stomp
      registers when dst_reg = src_reg"), the code will read an incorrect skb->len
      when src == dst. In this case we end up generating this xlated code:
      
        ; data_end = (void*)(long)skb->data_end;
         559: (79) r1 = *(u64 *)(r1 +200)   ; r1  = skb->data
         560: (61) r11 = *(u32 *)(r1 +112)  ; r11 = (skb->data)->len
         561: (0f) r1 += r11
         562: (61) r11 = *(u32 *)(r1 +116)
         563: (1f) r1 -= r11
      
      ... where line 560 is the reading 4B of (skb->data + 112) instead of the
      intended skb->len Here the skb pointer in r1 gets set to skb->data and the
      later deref for skb->len ends up following skb->data instead of skb.
      
      This fixes the issue similarly to the patch mentioned above by creating an
      additional temporary variable and using to store the register when dst_reg =
      src_reg. We name the variable bpf_temp_reg and place it in the cb context for
      sk_skb. Then we restore from the temp to ensure nothing is lost.
      
      Fixes: 16137b09 ("bpf: Compute data_end dynamically with JIT code")
      Signed-off-by: NJussi Maki <joamaki@gmail.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-6-john.fastabend@gmail.com
      b2c46181
    • J
      bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding · e0dc3b93
      John Fastabend 提交于
      Strparser is reusing the qdisc_skb_cb struct to stash the skb message handling
      progress, e.g. offset and length of the skb. First this is poorly named and
      inherits a struct from qdisc that doesn't reflect the actual usage of cb[] at
      this layer.
      
      But, more importantly strparser is using the following to access its metadata.
      
        (struct _strp_msg *)((void *)skb->cb + offsetof(struct qdisc_skb_cb, data))
      
      Where _strp_msg is defined as:
      
        struct _strp_msg {
              struct strp_msg            strp;                 /*     0     8 */
              int                        accum_len;            /*     8     4 */
      
              /* size: 12, cachelines: 1, members: 2 */
              /* last cacheline: 12 bytes */
        };
      
      So we use 12 bytes of ->data[] in struct. However in BPF code running parser
      and verdict the user has read capabilities into the data[] array as well. Its
      not too problematic, but we should not be exposing internal state to BPF
      program. If its really needed then we can use the probe_read() APIs which allow
      reading kernel memory. And I don't believe cb[] layer poses any API breakage by
      moving this around because programs can't depend on cb[] across layers.
      
      In order to fix another issue with a ctx rewrite we need to stash a temp
      variable somewhere. To make this work cleanly this patch builds a cb struct
      for sk_skb types called sk_skb_cb struct. Then we can use this consistently
      in the strparser, sockmap space. Additionally we can start allowing ->cb[]
      write access after this.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NJussi Maki <joamaki@gmail.com>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-5-john.fastabend@gmail.com
      e0dc3b93
    • J
      bpf, sockmap: Fix race in ingress receive verdict with redirect to self · c5d2177a
      John Fastabend 提交于
      A socket in a sockmap may have different combinations of programs attached
      depending on configuration. There can be no programs in which case the socket
      acts as a sink only. There can be a TX program in this case a BPF program is
      attached to sending side, but no RX program is attached. There can be an RX
      program only where sends have no BPF program attached, but receives are hooked
      with BPF. And finally, both TX and RX programs may be attached. Giving us the
      permutations:
      
       None, Tx, Rx, and TxRx
      
      To date most of our use cases have been TX case being used as a fast datapath
      to directly copy between local application and a userspace proxy. Or Rx cases
      and TxRX applications that are operating an in kernel based proxy. The traffic
      in the first case where we hook applications into a userspace application looks
      like this:
      
        AppA  redirect   AppB
         Tx <-----------> Rx
         |                |
         +                +
         TCP <--> lo <--> TCP
      
      In this case all traffic from AppA (after 3whs) is copied into the AppB
      ingress queue and no traffic is ever on the TCP recieive_queue.
      
      In the second case the application never receives, except in some rare error
      cases, traffic on the actual user space socket. Instead the send happens in
      the kernel.
      
                 AppProxy       socket pool
             sk0 ------------->{sk1,sk2, skn}
              ^                      |
              |                      |
              |                      v
             ingress              lb egress
             TCP                  TCP
      
      Here because traffic is never read off the socket with userspace recv() APIs
      there is only ever one reader on the sk receive_queue. Namely the BPF programs.
      
      However, we've started to introduce a third configuration where the BPF program
      on receive should process the data, but then the normal case is to push the
      data into the receive queue of AppB.
      
             AppB
             recv()                (userspace)
           -----------------------
             tcp_bpf_recvmsg()     (kernel)
               |             |
               |             |
               |             |
             ingress_msgQ    |
               |             |
             RX_BPF          |
               |             |
               v             v
             sk->receive_queue
      
      This is different from the App{A,B} redirect because traffic is first received
      on the sk->receive_queue.
      
      Now for the issue. The tcp_bpf_recvmsg() handler first checks the ingress_msg
      queue for any data handled by the BPF rx program and returned with PASS code
      so that it was enqueued on the ingress msg queue. Then if no data exists on
      that queue it checks the socket receive queue. Unfortunately, this is the same
      receive_queue the BPF program is reading data off of. So we get a race. Its
      possible for the recvmsg() hook to pull data off the receive_queue before the
      BPF hook has a chance to read it. It typically happens when an application is
      banging on recv() and getting EAGAINs. Until they manage to race with the RX
      BPF program.
      
      To fix this we note that before this patch at attach time when the socket is
      loaded into the map we check if it needs a TX program or just the base set of
      proto bpf hooks. Then it uses the above general RX hook regardless of if we
      have a BPF program attached at rx or not. This patch now extends this check to
      handle all cases enumerated above, TX, RX, TXRX, and none. And to fix above
      race when an RX program is attached we use a new hook that is nearly identical
      to the old one except now we do not let the recv() call skip the RX BPF program.
      Now only the BPF program pulls data from sk->receive_queue and recv() only
      pulls data from the ingress msgQ post BPF program handling.
      
      With this resolved our AppB from above has been up and running for many hours
      without detecting any errors. We do this by correlating counters in RX BPF
      events and the AppB to ensure data is never skipping the BPF program. Selftests,
      was not able to detect this because we only run them for a short period of time
      on well ordered send/recvs so we don't get any of the noise we see in real
      application environments.
      
      Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NJussi Maki <joamaki@gmail.com>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-4-john.fastabend@gmail.com
      c5d2177a
    • J
      bpf, sockmap: Remove unhash handler for BPF sockmap usage · b8b8315e
      John Fastabend 提交于
      We do not need to handle unhash from BPF side we can simply wait for the
      close to happen. The original concern was a socket could transition from
      ESTABLISHED state to a new state while the BPF hook was still attached.
      But, we convinced ourself this is no longer possible and we also improved
      BPF sockmap to handle listen sockets so this is no longer a problem.
      
      More importantly though there are cases where unhash is called when data is
      in the receive queue. The BPF unhash logic will flush this data which is
      wrong. To be correct it should keep the data in the receive queue and allow
      a receiving application to continue reading the data. This may happen when
      tcp_abort() is received for example. Instead of complicating the logic in
      unhash simply moving all this to tcp_close() hook solves this.
      
      Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NJussi Maki <joamaki@gmail.com>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-3-john.fastabend@gmail.com
      b8b8315e
    • J
      bpf, sockmap: Use stricter sk state checks in sk_lookup_assign · 40a34121
      John Fastabend 提交于
      In order to fix an issue with sockets in TCP sockmap redirect cases we plan
      to allow CLOSE state sockets to exist in the sockmap. However, the check in
      bpf_sk_lookup_assign() currently only invalidates sockets in the
      TCP_ESTABLISHED case relying on the checks on sockmap insert to ensure we
      never SOCK_CLOSE state sockets in the map.
      
      To prepare for this change we flip the logic in bpf_sk_lookup_assign() to
      explicitly test for the accepted cases. Namely, a tcp socket in TCP_LISTEN
      or a udp socket in TCP_CLOSE state. This also makes the code more resilent
      to future changes.
      Suggested-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-2-john.fastabend@gmail.com
      40a34121
  7. 08 11月, 2021 1 次提交
  8. 07 11月, 2021 4 次提交
  9. 06 11月, 2021 1 次提交
  10. 05 11月, 2021 7 次提交
  11. 04 11月, 2021 3 次提交
    • D
      9p: fix a bunch of checkpatch warnings · 6e195b0f
      Dominique Martinet 提交于
      Sohaib Mohamed started a serie of tiny and incomplete checkpatch fixes but
      seemingly stopped halfway -- take over and do most of it.
      This is still missing net/9p/trans* and net/9p/protocol.c for a later
      time...
      
      Link: http://lkml.kernel.org/r/20211102134608.1588018-3-dominique.martinet@atmark-techno.comSigned-off-by: NDominique Martinet <asmadeus@codewreck.org>
      6e195b0f
    • E
      net: fix possible NULL deref in sock_reserve_memory · d00c8ee3
      Eric Dumazet 提交于
      Sanity check in sock_reserve_memory() was not enough to prevent malicious
      user to trigger a NULL deref.
      
      In this case, the isse is that sk_prot->memory_allocated is NULL.
      
      Use standard sk_has_account() helper to deal with this.
      
      BUG: KASAN: null-ptr-deref in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
      BUG: KASAN: null-ptr-deref in atomic_long_add_return include/linux/atomic/atomic-instrumented.h:1218 [inline]
      BUG: KASAN: null-ptr-deref in sk_memory_allocated_add include/net/sock.h:1371 [inline]
      BUG: KASAN: null-ptr-deref in sock_reserve_memory net/core/sock.c:994 [inline]
      BUG: KASAN: null-ptr-deref in sock_setsockopt+0x22ab/0x2b30 net/core/sock.c:1443
      Write of size 8 at addr 0000000000000000 by task syz-executor.0/11270
      
      CPU: 1 PID: 11270 Comm: syz-executor.0 Not tainted 5.15.0-syzkaller #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       __kasan_report mm/kasan/report.c:446 [inline]
       kasan_report.cold+0x66/0xdf mm/kasan/report.c:459
       check_region_inline mm/kasan/generic.c:183 [inline]
       kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
       instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
       atomic_long_add_return include/linux/atomic/atomic-instrumented.h:1218 [inline]
       sk_memory_allocated_add include/net/sock.h:1371 [inline]
       sock_reserve_memory net/core/sock.c:994 [inline]
       sock_setsockopt+0x22ab/0x2b30 net/core/sock.c:1443
       __sys_setsockopt+0x4f8/0x610 net/socket.c:2172
       __do_sys_setsockopt net/socket.c:2187 [inline]
       __se_sys_setsockopt net/socket.c:2184 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2184
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f56076d5ae9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f5604c4b188 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007f56077e8f60 RCX: 00007f56076d5ae9
      RDX: 0000000000000049 RSI: 0000000000000001 RDI: 0000000000000003
      RBP: 00007f560772ff25 R08: 000000000000fec7 R09: 0000000000000000
      R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fffb61a100f R14: 00007f5604c4b300 R15: 0000000000022000
       </TASK>
      
      Fixes: 2bb2f5fb ("net: add new socket option SO_RESERVE_MEM")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d00c8ee3
    • L
      tcp: Use BIT() for OPTION_* constants · 3b65abb8
      Leonard Crestez 提交于
      Extending these flags using the existing (1 << x) pattern triggers
      complaints from checkpatch. Instead of ignoring checkpatch modify the
      existing values to use BIT(x) style in a separate commit.
      Signed-off-by: NLeonard Crestez <cdleonard@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b65abb8
  12. 03 11月, 2021 10 次提交
    • V
      net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge · 92f62485
      Vladimir Oltean 提交于
      Normally it is expected that the dsa_device_ops :: rcv() method finishes
      parsing the DSA tag and consumes it, then never looks at it again.
      
      But commit c0bcf537 ("net: dsa: ocelot: add hardware timestamping
      support for Felix") added support for RX timestamping in a very
      unconventional way. On this switch, a partial timestamp is available in
      the DSA header, but the driver got away with not parsing that timestamp
      right away, but instead delayed that parsing for a little longer:
      
      dsa_switch_rcv():
      	nskb = cpu_dp->rcv(skb, dev); <------------- not here
      	-> ocelot_rcv()
      	...
      
      	skb = nskb;
      	skb_push(skb, ETH_HLEN);
      	skb->pkt_type = PACKET_HOST;
      	skb->protocol = eth_type_trans(skb, skb->dev);
      
      	...
      
      	if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here
      	-> felix_rxtstamp()
      		return 0;
      
      When in felix_rxtstamp(), this driver accounted for the fact that
      eth_type_trans() happened in the meanwhile, so it got a hold of the
      extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes
      from the current skb->data.
      
      This worked for quite some time but was quite fragile from the very
      beginning. Not to mention that having DSA tag parsing split in two
      different files, under different folders (net/dsa/tag_ocelot.c vs
      drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to
      come that they might break this.
      
      Finally, the blamed commit does the following: at the end of
      ocelot_rcv(), it checks whether the skb payload contains a VLAN header.
      If it does, and this port is under a VLAN-aware bridge, that VLAN ID
      might not be correct in the sense that the packet might have suffered
      VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID
      from the skb payload using __skb_vlan_pop(), and take the classified
      VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the
      classified VLAN, and the skb payload is VLAN-untagged.
      
      The big problem is that __skb_vlan_pop() does:
      
      	memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
      	__skb_pull(skb, VLAN_HLEN);
      
      aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes
      from the skb headroom (effectively also moving skb->data, by definition).
      So for felix_rxtstamp()'s fragile logic, all bets are off now.
      Instead of having the "extraction" pointer point to the DSA header,
      it actually points to 4 bytes _inside_ the extraction header.
      Corollary, the last 4 bytes of the "extraction" header are in fact 4
      stale bytes of the destination MAC address from the Ethernet header,
      from prior to the __skb_vlan_pop() movement.
      
      So of course, RX timestamps are completely bogus when the system is
      configured in this way.
      
      The fix is actually very simple: just don't structure the code like that.
      For better or worse, the DSA PTP timestamping API does not offer a
      straightforward way for drivers to present their RX timestamps, but
      other drivers (sja1105) have established a simple mechanism to carry
      their RX timestamp from dsa_device_ops :: rcv() all the way to
      dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to
      simply save the partial timestamp to the skb->cb, and complete it later.
      
      Question: why don't we simply populate the skb's struct
      skb_shared_hwtstamps from ocelot_rcv(), and bother with this
      complication of propagating the timestamp to felix_rxtstamp()?
      
      Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether
      PTP packets need sleepable context to retrieve the full RX timestamp.
      Currently felix_rxtstamp() answers "no, thanks" to that question, and
      calls ocelot_ptp_gettime64() from softirq atomic context. This is
      understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so
      hardware access does not require sleeping. But the felix driver is
      preparing for the introduction of other switches where hardware access
      is over a slow bus like SPI or MDIO:
      https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/
      
      So I would like to keep this code structure, so the rework needed when
      that driver will need PTP support will be minimal (answer "yes, I need
      deferred context for this skb's RX timestamp", then the partial
      timestamp will still be found in the skb->cb.
      
      Fixes: ea440cd2 ("net: dsa: tag_ocelot: use VLAN information from tagging header when available")
      Reported-by: NPo Liu <po.liu@nxp.com>
      Cc: Yangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92f62485
    • Z
      net: vlan: fix a UAF in vlan_dev_real_dev() · 563bcbae
      Ziyang Xuan 提交于
      The real_dev of a vlan net_device may be freed after
      unregister_vlan_dev(). Access the real_dev continually by
      vlan_dev_real_dev() will trigger the UAF problem for the
      real_dev like following:
      
      ==================================================================
      BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
      Call Trace:
       kasan_report.cold+0x83/0xdf
       vlan_dev_real_dev+0xf9/0x120
       is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
       is_eth_port_of_netdev_filter+0x28/0x40
       ib_enum_roce_netdev+0x1a3/0x300
       ib_enum_all_roce_netdevs+0xc7/0x140
       netdevice_event_work_handler+0x9d/0x210
      ...
      
      Freed by task 9288:
       kasan_save_stack+0x1b/0x40
       kasan_set_track+0x1c/0x30
       kasan_set_free_info+0x20/0x30
       __kasan_slab_free+0xfc/0x130
       slab_free_freelist_hook+0xdd/0x240
       kfree+0xe4/0x690
       kvfree+0x42/0x50
       device_release+0x9f/0x240
       kobject_put+0x1c8/0x530
       put_device+0x1b/0x30
       free_netdev+0x370/0x540
       ppp_destroy_interface+0x313/0x3d0
      ...
      
      Move the put_device(real_dev) to vlan_dev_free(). Ensure
      real_dev not be freed before vlan_dev unregistered.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      563bcbae
    • M
      net: udp6: replace __UDP_INC_STATS() with __UDP6_INC_STATS() · 250962e4
      Menglong Dong 提交于
      __UDP_INC_STATS() is used in udpv6_queue_rcv_one_skb() when encap_rcv()
      fails. __UDP6_INC_STATS() should be used here, so replace it with
      __UDP6_INC_STATS().
      Signed-off-by: NMenglong Dong <imagedong@tencent.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      250962e4
    • J
      ethtool: fix ethtool msg len calculation for pause stats · 1aabe578
      Jakub Kicinski 提交于
      ETHTOOL_A_PAUSE_STAT_MAX is the MAX attribute id,
      so we need to subtract non-stats and add one to
      get a count (IOW -2+1 == -1).
      
      Otherwise we'll see:
      
        ethnl cmd 21: calculated reply length 40, but consumed 52
      
      Fixes: 9a27a330 ("ethtool: add standard pause stats")
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1aabe578
    • T
      net: avoid double accounting for pure zerocopy skbs · 9b65b17d
      Talal Ahmad 提交于
      Track skbs containing only zerocopy data and avoid charging them to
      kernel memory to correctly account the memory utilization for
      msg_zerocopy. All of the data in such skbs is held in user pages which
      are already accounted to user. Before this change, they are charged
      again in kernel in __zerocopy_sg_from_iter. The charging in kernel is
      excessive because data is not being copied into skb frags. This
      excessive charging can lead to kernel going into memory pressure
      state which impacts all sockets in the system adversely. Mark pure
      zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove
      charge/uncharge for data in such skbs.
      
      Initially, an skb is marked pure zerocopy when it is empty and in
      zerocopy path. skb can then change from a pure zerocopy skb to mixed
      data skb (zerocopy and copy data) if it is at tail of write queue and
      there is room available in it and non-zerocopy data is being sent in
      the next sendmsg call. At this time sk_mem_charge is done for the pure
      zerocopied data and the pure zerocopy flag is unmarked. We found that
      this happens very rarely on workloads that pass MSG_ZEROCOPY.
      
      A pure zerocopy skb can later be coalesced into normal skb if they are
      next to each other in queue but this patch prevents coalescing from
      happening. This avoids complexity of charging when skb downgrades from
      pure zerocopy to mixed. This is also rare.
      
      In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge
      for SKB_TRUESIZE(skb_end_offset(skb)) is done for sk_mem_charge in
      tcp_skb_entail for an skb without data.
      
      Testing with the msg_zerocopy.c benchmark between two hosts(100G nics)
      with zerocopy showed that before this patch the 'sock' variable in
      memory.stat for cgroup2 that tracks sum of sk_forward_alloc,
      sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this
      change it is 0. This is due to no charge to sk_forward_alloc for
      zerocopy data and shows memory utilization for kernel is lowered.
      
      With this commit we don't see the warning we saw in previous commit
      which resulted in commit 84882cf7.
      Signed-off-by: NTalal Ahmad <talalahmad@google.com>
      Acked-by: NArjun Roy <arjunroy@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b65b17d
    • Z
      net:ipv6:Remove unneeded semicolon · acaea0d5
      Zhang Mingyu 提交于
      Eliminate the following coccinelle check warning:
      net/ipv6/seg6.c:381:2-3
      Reported-by: NZeal Robot <zealci@zte.com.cn>
      Signed-off-by: NZhang Mingyu <zhang.mingyu@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acaea0d5
    • L
      NFC: add necessary privilege flags in netlink layer · aedddb4e
      Lin Ma 提交于
      The CAP_NET_ADMIN checks are needed to prevent attackers faking a
      device under NCIUARTSETDRIVER and exploit privileged commands.
      
      This patch add GENL_ADMIN_PERM flags in genl_ops to fulfill the check.
      Except for commands like NFC_CMD_GET_DEVICE, NFC_CMD_GET_TARGET,
      NFC_CMD_LLC_GET_PARAMS, and NFC_CMD_GET_SE, which are mainly information-
      read operations.
      Signed-off-by: NLin Ma <linma@zju.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aedddb4e
    • X
      security: add sctp_assoc_established hook · 7c2ef024
      Xin Long 提交于
      security_sctp_assoc_established() is added to replace
      security_inet_conn_established() called in
      sctp_sf_do_5_1E_ca(), so that asoc can be accessed in security
      subsystem and save the peer secid to asoc->peer_secid.
      
      v1->v2:
        - fix the return value of security_sctp_assoc_established() in
          security.h, found by kernel test robot and Ondrej.
      
      Fixes: 72e89f50 ("security: Add support for SCTP security hooks")
      Reported-by: NPrashanth Prahlad <pprahlad@redhat.com>
      Reviewed-by: NRichard Haines <richard_c_haines@btinternet.com>
      Tested-by: NRichard Haines <richard_c_haines@btinternet.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c2ef024
    • X
      security: call security_sctp_assoc_request in sctp_sf_do_5_1D_ce · e215dab1
      Xin Long 提交于
      The asoc created when receives the INIT chunk is a temporary one, it
      will be deleted after INIT_ACK chunk is replied. So for the real asoc
      created in sctp_sf_do_5_1D_ce() when the COOKIE_ECHO chunk is received,
      security_sctp_assoc_request() should also be called.
      
      v1->v2:
        - fix some typo and grammar errors, noticed by Ondrej.
      
      Fixes: 72e89f50 ("security: Add support for SCTP security hooks")
      Reported-by: NPrashanth Prahlad <pprahlad@redhat.com>
      Reviewed-by: NRichard Haines <richard_c_haines@btinternet.com>
      Tested-by: NRichard Haines <richard_c_haines@btinternet.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e215dab1
    • X
      security: pass asoc to sctp_assoc_request and sctp_sk_clone · c081d53f
      Xin Long 提交于
      This patch is to move secid and peer_secid from endpoint to association,
      and pass asoc to sctp_assoc_request and sctp_sk_clone instead of ep. As
      ep is the local endpoint and asoc represents a connection, and in SCTP
      one sk/ep could have multiple asoc/connection, saving secid/peer_secid
      for new asoc will overwrite the old asoc's.
      
      Note that since asoc can be passed as NULL, security_sctp_assoc_request()
      is moved to the place right after the new_asoc is created in
      sctp_sf_do_5_1B_init() and sctp_sf_do_unexpected_init().
      
      v1->v2:
        - fix the description of selinux_netlbl_skbuff_setsid(), as Jakub noticed.
        - fix the annotation in selinux_sctp_assoc_request(), as Richard Noticed.
      
      Fixes: 72e89f50 ("security: Add support for SCTP security hooks")
      Reported-by: NPrashanth Prahlad <pprahlad@redhat.com>
      Reviewed-by: NRichard Haines <richard_c_haines@btinternet.com>
      Tested-by: NRichard Haines <richard_c_haines@btinternet.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c081d53f