1. 30 3月, 2020 1 次提交
    • Q
      sctp: fix refcount bug in sctp_wfree · 5c3e82fe
      Qiujun Huang 提交于
      We should iterate over the datamsgs to move
      all chunks(skbs) to newsk.
      
      The following case cause the bug:
      for the trouble SKB, it was in outq->transmitted list
      
      sctp_outq_sack
              sctp_check_transmitted
                      SKB was moved to outq->sacked list
              then throw away the sack queue
                      SKB was deleted from outq->sacked
      (but it was held by datamsg at sctp_datamsg_to_asoc
      So, sctp_wfree was not called here)
      
      then migrate happened
      
              sctp_for_each_tx_datachunk(
              sctp_clear_owner_w);
              sctp_assoc_migrate();
              sctp_for_each_tx_datachunk(
              sctp_set_owner_w);
      SKB was not in the outq, and was not changed to newsk
      
      finally
      
      __sctp_outq_teardown
              sctp_chunk_put (for another skb)
                      sctp_datamsg_put
                              __kfree_skb(msg->frag_list)
                                      sctp_wfree (for SKB)
      	SKB->sk was still oldsk (skb->sk != asoc->base.sk).
      
      Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com
      Signed-off-by: NQiujun Huang <hqjagain@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c3e82fe
  2. 09 3月, 2020 1 次提交
  3. 18 2月, 2020 1 次提交
  4. 07 1月, 2020 1 次提交
    • X
      sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY · be7a7729
      Xin Long 提交于
      This patch is to fix a memleak caused by no place to free cmd->obj.chunk
      for the unprocessed SCTP_CMD_REPLY. This issue occurs when failing to
      process a cmd while there're still SCTP_CMD_REPLY cmds on the cmd seq
      with an allocated chunk in cmd->obj.chunk.
      
      So fix it by freeing cmd->obj.chunk for each SCTP_CMD_REPLY cmd left on
      the cmd seq when any cmd returns error. While at it, also remove 'nomem'
      label.
      
      Reported-by: syzbot+107c4aff5f392bf1517f@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be7a7729
  5. 31 12月, 2019 1 次提交
    • K
      sctp: add enabled check for path tracepoint loop. · f398efc1
      Kevin Kou 提交于
      sctp_outq_sack is the main function handles SACK, it is called very
      frequently. As the commit "move trace_sctp_probe_path into sctp_outq_sack"
      added below code to this function, sctp tracepoint is disabled most of time,
      but the loop of transport list will be always called even though the
      tracepoint is disabled, this is unnecessary.
      
      +	/* SCTP path tracepoint for congestion control debugging. */
      +	list_for_each_entry(transport, transport_list, transports) {
      +		trace_sctp_probe_path(transport, asoc);
      +	}
      
      This patch is to add tracepoint enabled check at outside of the loop of
      transport list, and avoid traversing the loop when trace is disabled,
      it is a small optimization.
      Signed-off-by: NKevin Kou <qdkevin.kou@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f398efc1
  6. 28 12月, 2019 1 次提交
    • K
      sctp: do trace_sctp_probe after SACK validation and check · 356b23c0
      Kevin Kou 提交于
      The function sctp_sf_eat_sack_6_2 now performs the Verification
      Tag validation, Chunk length validation, Bogu check, and also
      the detection of out-of-order SACK based on the RFC2960
      Section 6.2 at the beginning, and finally performs the further
      processing of SACK. The trace_sctp_probe now triggered before
      the above necessary validation and check.
      
      this patch is to do the trace_sctp_probe after the chunk sanity
      tests, but keep doing trace if the SACK received is out of order,
      for the out-of-order SACK is valuable to congestion control
      debugging.
      
      v1->v2:
       - keep doing SCTP trace if the SACK is out of order as Marcelo's
         suggestion.
      v2->v3:
       - regenerate the patch as v2 generated on top of v1, and add
         'net-next' tag to the new one as Marcelo's comments.
      Signed-off-by: NKevin Kou <qdkevin.kou@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      356b23c0
  7. 27 12月, 2019 1 次提交
    • K
      sctp: move trace_sctp_probe_path into sctp_outq_sack · f643ee29
      Kevin Kou 提交于
      The original patch bringed in the "SCTP ACK tracking trace event"
      feature was committed at Dec.20, 2017, it replaced jprobe usage
      with trace events, and bringed in two trace events, one is
      TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path).
      The original patch intended to trigger the trace_sctp_probe_path in
      TRACE_EVENT(sctp_probe) as below code,
      
      +TRACE_EVENT(sctp_probe,
      +
      +	TP_PROTO(const struct sctp_endpoint *ep,
      +		 const struct sctp_association *asoc,
      +		 struct sctp_chunk *chunk),
      +
      +	TP_ARGS(ep, asoc, chunk),
      +
      +	TP_STRUCT__entry(
      +		__field(__u64, asoc)
      +		__field(__u32, mark)
      +		__field(__u16, bind_port)
      +		__field(__u16, peer_port)
      +		__field(__u32, pathmtu)
      +		__field(__u32, rwnd)
      +		__field(__u16, unack_data)
      +	),
      +
      +	TP_fast_assign(
      +		struct sk_buff *skb = chunk->skb;
      +
      +		__entry->asoc = (unsigned long)asoc;
      +		__entry->mark = skb->mark;
      +		__entry->bind_port = ep->base.bind_addr.port;
      +		__entry->peer_port = asoc->peer.port;
      +		__entry->pathmtu = asoc->pathmtu;
      +		__entry->rwnd = asoc->peer.rwnd;
      +		__entry->unack_data = asoc->unack_data;
      +
      +		if (trace_sctp_probe_path_enabled()) {
      +			struct sctp_transport *sp;
      +
      +			list_for_each_entry(sp, &asoc->peer.transport_addr_list,
      +					    transports) {
      +				trace_sctp_probe_path(sp, asoc);
      +			}
      +		}
      +	),
      
      But I found it did not work when I did testing, and trace_sctp_probe_path
      had no output, I finally found that there is trace buffer lock
      operation(trace_event_buffer_reserve) in include/trace/trace_events.h:
      
      static notrace void							\
      trace_event_raw_event_##call(void *__data, proto)			\
      {									\
      	struct trace_event_file *trace_file = __data;			\
      	struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
      	struct trace_event_buffer fbuffer;				\
      	struct trace_event_raw_##call *entry;				\
      	int __data_size;						\
      									\
      	if (trace_trigger_soft_disabled(trace_file))			\
      		return;							\
      									\
      	__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
      									\
      	entry = trace_event_buffer_reserve(&fbuffer, trace_file,	\
      				 sizeof(*entry) + __data_size);		\
      									\
      	if (!entry)							\
      		return;							\
      									\
      	tstruct								\
      									\
      	{ assign; }							\
      									\
      	trace_event_buffer_commit(&fbuffer);				\
      }
      
      The reason caused no output of trace_sctp_probe_path is that
      trace_sctp_probe_path written in TP_fast_assign part of
      TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the
      trace_event_buffer_reserve() when compiler expands Macro,
      
              entry = trace_event_buffer_reserve(&fbuffer, trace_file,        \
                                       sizeof(*entry) + __data_size);         \
                                                                              \
              if (!entry)                                                     \
                      return;                                                 \
                                                                              \
              tstruct                                                         \
                                                                              \
              { assign; }                                                     \
      
      so trace_sctp_probe_path finally can not acquire trace_event_buffer
      and return no output, that is to say the nest of tracepoint entry function
      is not allowed. The function call flow is:
      
      trace_sctp_probe()
      -> trace_event_raw_event_sctp_probe()
       -> lock buffer
       -> trace_sctp_probe_path()
         -> trace_event_raw_event_sctp_probe_path()  --nested
         -> buffer has been locked and return no output.
      
      This patch is to remove trace_sctp_probe_path from the TP_fast_assign
      part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function,
      and trigger sctp_probe_path_trace in sctp_outq_sack.
      
      After this patch, you can enable both events individually,
        # cd /sys/kernel/debug/tracing
        # echo 1 > events/sctp/sctp_probe/enable
        # echo 1 > events/sctp/sctp_probe_path/enable
      
      Or, you can enable all the events under sctp.
      
        # echo 1 > events/sctp/enable
      Signed-off-by: NKevin Kou <qdkevin.kou@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f643ee29
  8. 25 12月, 2019 2 次提交
    • H
      net: add bool confirm_neigh parameter for dst_ops.update_pmtu · bd085ef6
      Hangbin Liu 提交于
      The MTU update code is supposed to be invoked in response to real
      networking events that update the PMTU. In IPv6 PMTU update function
      __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
      confirmed time.
      
      But for tunnel code, it will call pmtu before xmit, like:
        - tnl_update_pmtu()
          - skb_dst_update_pmtu()
            - ip6_rt_update_pmtu()
              - __ip6_rt_update_pmtu()
                - dst_confirm_neigh()
      
      If the tunnel remote dst mac address changed and we still do the neigh
      confirm, we will not be able to update neigh cache and ping6 remote
      will failed.
      
      So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
      should not be invoking dst_confirm_neigh() as we have no evidence
      of successful two-way communication at this point.
      
      On the other hand it is also important to keep the neigh reachability fresh
      for TCP flows, so we cannot remove this dst_confirm_neigh() call.
      
      To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
      to choose whether we should do neigh update or not. I will add the parameter
      in this patch and set all the callers to true to comply with the previous
      way, and fix the tunnel code one by one on later patches.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd085ef6
    • M
      sctp: fix err handling of stream initialization · 61d5d406
      Marcelo Ricardo Leitner 提交于
      The fix on 951c6db9 fixed the issued reported there but introduced
      another. When the allocation fails within sctp_stream_init() it is
      okay/necessary to free the genradix. But it is also called when adding
      new streams, from sctp_send_add_streams() and
      sctp_process_strreset_addstrm_in() and in those situations it cannot
      just free the genradix because by then it is a fully operational
      association.
      
      The fix here then is to only free the genradix in sctp_stream_init()
      and on those other call sites  move on with what it already had and let
      the subsequent error handling to handle it.
      
      Tested with the reproducers from this report and the previous one,
      with lksctp-tools and sctp-tests.
      
      Reported-by: syzbot+9a1bc632e78a1a98488b@syzkaller.appspotmail.com
      Fixes: 951c6db9 ("sctp: fix memleak on err handling of stream initialization")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61d5d406
  9. 18 12月, 2019 1 次提交
  10. 10 12月, 2019 2 次提交
    • X
      sctp: get netns from asoc and ep base · 4e7696d9
      Xin Long 提交于
      Commit 31243461 ("sctp: cache netns in sctp_ep_common") set netns
      in asoc and ep base since they're created, and it will never change.
      It's a better way to get netns from asoc and ep base, comparing to
      calling sock_net().
      
      This patch is to replace them.
      
      v1->v2:
        - no change.
      Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e7696d9
    • X
      sctp: fully initialize v4 addr in some functions · b6f3320b
      Xin Long 提交于
      Syzbot found a crash:
      
        BUG: KMSAN: uninit-value in crc32_body lib/crc32.c:112 [inline]
        BUG: KMSAN: uninit-value in crc32_le_generic lib/crc32.c:179 [inline]
        BUG: KMSAN: uninit-value in __crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
        Call Trace:
          crc32_body lib/crc32.c:112 [inline]
          crc32_le_generic lib/crc32.c:179 [inline]
          __crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
          chksum_update+0xb2/0x110 crypto/crc32c_generic.c:90
          crypto_shash_update+0x4c5/0x530 crypto/shash.c:107
          crc32c+0x150/0x220 lib/libcrc32c.c:47
          sctp_csum_update+0x89/0xa0 include/net/sctp/checksum.h:36
          __skb_checksum+0x1297/0x12a0 net/core/skbuff.c:2640
          sctp_compute_cksum include/net/sctp/checksum.h:59 [inline]
          sctp_packet_pack net/sctp/output.c:528 [inline]
          sctp_packet_transmit+0x40fb/0x4250 net/sctp/output.c:597
          sctp_outq_flush_transports net/sctp/outqueue.c:1146 [inline]
          sctp_outq_flush+0x1823/0x5d80 net/sctp/outqueue.c:1194
          sctp_outq_uncork+0xd0/0xf0 net/sctp/outqueue.c:757
          sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1781 [inline]
          sctp_side_effects net/sctp/sm_sideeffect.c:1184 [inline]
          sctp_do_sm+0x8fe1/0x9720 net/sctp/sm_sideeffect.c:1155
          sctp_primitive_REQUESTHEARTBEAT+0x175/0x1a0 net/sctp/primitive.c:185
          sctp_apply_peer_addr_params+0x212/0x1d40 net/sctp/socket.c:2433
          sctp_setsockopt_peer_addr_params net/sctp/socket.c:2686 [inline]
          sctp_setsockopt+0x189bb/0x19090 net/sctp/socket.c:4672
      
      The issue was caused by transport->ipaddr set with uninit addr param, which
      was passed by:
      
        sctp_transport_init net/sctp/transport.c:47 [inline]
        sctp_transport_new+0x248/0xa00 net/sctp/transport.c:100
        sctp_assoc_add_peer+0x5ba/0x2030 net/sctp/associola.c:611
        sctp_process_param net/sctp/sm_make_chunk.c:2524 [inline]
      
      where 'addr' is set by sctp_v4_from_addr_param(), and it doesn't initialize
      the padding of addr->v4.
      
      Later when calling sctp_make_heartbeat(), hbinfo.daddr(=transport->ipaddr)
      will become the part of skb, and the issue occurs.
      
      This patch is to fix it by initializing the padding of addr->v4 in
      sctp_v4_from_addr_param(), as well as other functions that do the similar
      thing, and these functions shouldn't trust that the caller initializes the
      memory, as Marcelo suggested.
      
      Reported-by: syzbot+6dcbfea81cd3d4dd0b02@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6f3320b
  11. 05 12月, 2019 1 次提交
  12. 27 11月, 2019 2 次提交
  13. 24 11月, 2019 2 次提交
    • X
      sctp: cache netns in sctp_ep_common · 31243461
      Xin Long 提交于
      This patch is to fix a data-race reported by syzbot:
      
        BUG: KCSAN: data-race in sctp_assoc_migrate / sctp_hash_obj
      
        write to 0xffff8880b67c0020 of 8 bytes by task 18908 on cpu 1:
          sctp_assoc_migrate+0x1a6/0x290 net/sctp/associola.c:1091
          sctp_sock_migrate+0x8aa/0x9b0 net/sctp/socket.c:9465
          sctp_accept+0x3c8/0x470 net/sctp/socket.c:4916
          inet_accept+0x7f/0x360 net/ipv4/af_inet.c:734
          __sys_accept4+0x224/0x430 net/socket.c:1754
          __do_sys_accept net/socket.c:1795 [inline]
          __se_sys_accept net/socket.c:1792 [inline]
          __x64_sys_accept+0x4e/0x60 net/socket.c:1792
          do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        read to 0xffff8880b67c0020 of 8 bytes by task 12003 on cpu 0:
          sctp_hash_obj+0x4f/0x2d0 net/sctp/input.c:894
          rht_key_get_hash include/linux/rhashtable.h:133 [inline]
          rht_key_hashfn include/linux/rhashtable.h:159 [inline]
          rht_head_hashfn include/linux/rhashtable.h:174 [inline]
          head_hashfn lib/rhashtable.c:41 [inline]
          rhashtable_rehash_one lib/rhashtable.c:245 [inline]
          rhashtable_rehash_chain lib/rhashtable.c:276 [inline]
          rhashtable_rehash_table lib/rhashtable.c:316 [inline]
          rht_deferred_worker+0x468/0xab0 lib/rhashtable.c:420
          process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
          worker_thread+0xa0/0x800 kernel/workqueue.c:2415
          kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      It was caused by rhashtable access asoc->base.sk when sctp_assoc_migrate
      is changing its value. However, what rhashtable wants is netns from asoc
      base.sk, and for an asoc, its netns won't change once set. So we can
      simply fix it by caching netns since created.
      
      Fixes: d6c0256a ("sctp: add the rhashtable apis for sctp global transport hashtable")
      Reported-by: syzbot+e3b35fe7918ff0ee474e@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      31243461
    • N
      sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook · b6631c60
      Navid Emamdoost 提交于
      In the implementation of sctp_sf_do_5_2_4_dupcook() the allocated
      new_asoc is leaked if security_sctp_assoc_request() fails. Release it
      via sctp_association_free().
      
      Fixes: 2277c7cd ("sctp: Add LSM hooks")
      Signed-off-by: NNavid Emamdoost <navid.emamdoost@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      b6631c60
  14. 09 11月, 2019 5 次提交
    • X
      sctp: add SCTP_PEER_ADDR_THLDS_V2 sockopt · d467ac0a
      Xin Long 提交于
      Section 7.2 of rfc7829: "Peer Address Thresholds (SCTP_PEER_ADDR_THLDS)
      Socket Option" extends 'struct sctp_paddrthlds' with 'spt_pathcpthld'
      added to allow a user to change ps_retrans per sock/asoc/transport, as
      other 2 paddrthlds: pf_retrans, pathmaxrxt.
      
      Note: to not break the user's program, here to support pf_retrans dump
      and setting by adding a new sockopt SCTP_PEER_ADDR_THLDS_V2, and a new
      structure sctp_paddrthlds_v2 instead of extending sctp_paddrthlds.
      
      Also, when setting ps_retrans, the value is not allowed to be greater
      than pf_retrans.
      
      v1->v2:
        - use SCTP_PEER_ADDR_THLDS_V2 to set/get pf_retrans instead,
          as Marcelo and David Laight suggested.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d467ac0a
    • X
      sctp: add support for Primary Path Switchover · 34515e94
      Xin Long 提交于
      This is a new feature defined in section 5 of rfc7829: "Primary Path
      Switchover". By introducing a new tunable parameter:
      
        Primary.Switchover.Max.Retrans (PSMR)
      
      The primary path will be changed to another active path when the path
      error counter on the old primary path exceeds PSMR, so that "the SCTP
      sender is allowed to continue data transmission on a new working path
      even when the old primary destination address becomes active again".
      
      This patch is to add this tunable parameter, 'ps_retrans' per netns,
      sock, asoc and transport. It also allows a user to change ps_retrans
      per netns by sysctl, and ps_retrans per sock/asoc/transport will be
      initialized with it.
      
      The check will be done in sctp_do_8_2_transport_strike() when this
      feature is enabled.
      
      Note this feature is disabled by initializing 'ps_retrans' per netns
      as 0xffff by default, and its value can't be less than 'pf_retrans'
      when changing by sysctl.
      
      v3->v4:
        - add define SCTP_PS_RETRANS_MAX 0xffff, and use it on extra2 of
          sysctl 'ps_retrans'.
        - add a new entry for ps_retrans on ip-sysctl.txt.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34515e94
    • X
      sctp: add SCTP_EXPOSE_POTENTIALLY_FAILED_STATE sockopt · 8d2a6935
      Xin Long 提交于
      This is a sockopt defined in section 7.3 of rfc7829: "Exposing
      the Potentially Failed Path State", by which users can change
      pf_expose per sock and asoc.
      
      The new sockopt SCTP_EXPOSE_POTENTIALLY_FAILED_STATE is also
      known as SCTP_EXPOSE_PF_STATE for short.
      
      v2->v3:
        - return -EINVAL if params.assoc_value > SCTP_PF_EXPOSE_MAX.
        - define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE.
      v3->v4:
        - improve changelog.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d2a6935
    • X
      sctp: add SCTP_ADDR_POTENTIALLY_FAILED notification · 768e1518
      Xin Long 提交于
      SCTP Quick failover draft section 5.1, point 5 has been removed
      from rfc7829. Instead, "the sender SHOULD (i) notify the Upper
      Layer Protocol (ULP) about this state transition", as said in
      section 3.2, point 8.
      
      So this patch is to add SCTP_ADDR_POTENTIALLY_FAILED, defined
      in section 7.1, "which is reported if the affected address
      becomes PF". Also remove transport cwnd's update when moving
      from PF back to ACTIVE , which is no longer in rfc7829 either.
      
      Note that ulp_notify will be set to false if asoc->expose is
      not 'enabled', according to last patch.
      
      v2->v3:
        - define SCTP_ADDR_PF SCTP_ADDR_POTENTIALLY_FAILED.
      v3->v4:
        - initialize spc_state with SCTP_ADDR_AVAILABLE, as Marcelo suggested.
        - check asoc->pf_expose in sctp_assoc_control_transport(), as Marcelo
          suggested.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      768e1518
    • X
      sctp: add pf_expose per netns and sock and asoc · aef587be
      Xin Long 提交于
      As said in rfc7829, section 3, point 12:
      
        The SCTP stack SHOULD expose the PF state of its destination
        addresses to the ULP as well as provide the means to notify the
        ULP of state transitions of its destination addresses from
        active to PF, and vice versa.  However, it is recommended that
        an SCTP stack implementing SCTP-PF also allows for the ULP to be
        kept ignorant of the PF state of its destinations and the
        associated state transitions, thus allowing for retention of the
        simpler state transition model of [RFC4960] in the ULP.
      
      Not only does it allow to expose the PF state to ULP, but also
      allow to ignore sctp-pf to ULP.
      
      So this patch is to add pf_expose per netns, sock and asoc. And in
      sctp_assoc_control_transport(), ulp_notify will be set to false if
      asoc->expose is not 'enabled' in next patch.
      
      It also allows a user to change pf_expose per netns by sysctl, and
      pf_expose per sock and asoc will be initialized with it.
      
      Note that pf_expose also works for SCTP_GET_PEER_ADDR_INFO sockopt,
      to not allow a user to query the state of a sctp-pf peer address
      when pf_expose is 'disabled', as said in section 7.3.
      
      v1->v2:
        - Fix a build warning noticed by Nathan Chancellor.
      v2->v3:
        - set pf_expose to UNUSED by default to keep compatible with old
          applications.
      v3->v4:
        - add a new entry for pf_expose on ip-sysctl.txt, as Marcelo suggested.
        - change this patch to 1/5, and move sctp_assoc_control_transport
          change into 2/5, as Marcelo suggested.
        - use SCTP_PF_EXPOSE_UNSET instead of SCTP_PF_EXPOSE_UNUSED, and
          set SCTP_PF_EXPOSE_UNSET to 0 in enum, as Marcelo suggested.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aef587be
  15. 07 11月, 2019 3 次提交
  16. 02 11月, 2019 1 次提交
    • E
      inet: stop leaking jiffies on the wire · a904a069
      Eric Dumazet 提交于
      Historically linux tried to stick to RFC 791, 1122, 2003
      for IPv4 ID field generation.
      
      RFC 6864 made clear that no matter how hard we try,
      we can not ensure unicity of IP ID within maximum
      lifetime for all datagrams with a given source
      address/destination address/protocol tuple.
      
      Linux uses a per socket inet generator (inet_id), initialized
      at connection startup with a XOR of 'jiffies' and other
      fields that appear clear on the wire.
      
      Thiemo Nagel pointed that this strategy is a privacy
      concern as this provides 16 bits of entropy to fingerprint
      devices.
      
      Let's switch to a random starting point, this is just as
      good as far as RFC 6864 is concerned and does not leak
      anything critical.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NThiemo Nagel <tnagel@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a904a069
  17. 29 10月, 2019 2 次提交
  18. 16 10月, 2019 1 次提交
    • X
      sctp: change sctp_prot .no_autobind with true · 63dfb793
      Xin Long 提交于
      syzbot reported a memory leak:
      
        BUG: memory leak, unreferenced object 0xffff888120b3d380 (size 64):
        backtrace:
      
          [...] slab_alloc mm/slab.c:3319 [inline]
          [...] kmem_cache_alloc+0x13f/0x2c0 mm/slab.c:3483
          [...] sctp_bucket_create net/sctp/socket.c:8523 [inline]
          [...] sctp_get_port_local+0x189/0x5a0 net/sctp/socket.c:8270
          [...] sctp_do_bind+0xcc/0x200 net/sctp/socket.c:402
          [...] sctp_bindx_add+0x4b/0xd0 net/sctp/socket.c:497
          [...] sctp_setsockopt_bindx+0x156/0x1b0 net/sctp/socket.c:1022
          [...] sctp_setsockopt net/sctp/socket.c:4641 [inline]
          [...] sctp_setsockopt+0xaea/0x2dc0 net/sctp/socket.c:4611
          [...] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3147
          [...] __sys_setsockopt+0x10f/0x220 net/socket.c:2084
          [...] __do_sys_setsockopt net/socket.c:2100 [inline]
      
      It was caused by when sending msgs without binding a port, in the path:
      inet_sendmsg() -> inet_send_prepare() -> inet_autobind() ->
      .get_port/sctp_get_port(), sp->bind_hash will be set while bp->port is
      not. Later when binding another port by sctp_setsockopt_bindx(), a new
      bucket will be created as bp->port is not set.
      
      sctp's autobind is supposed to call sctp_autobind() where it does all
      things including setting bp->port. Since sctp_autobind() is called in
      sctp_sendmsg() if the sk is not yet bound, it should have skipped the
      auto bind.
      
      THis patch is to avoid calling inet_autobind() in inet_send_prepare()
      by changing sctp_prot .no_autobind with true, also remove the unused
      .get_port.
      
      Reported-by: syzbot+d44f7bbebdea49dbc84a@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63dfb793
  19. 12 10月, 2019 1 次提交
  20. 10 10月, 2019 7 次提交
    • E
      net: silence KCSAN warnings about sk->sk_backlog.len reads · 70c26558
      Eric Dumazet 提交于
      sk->sk_backlog.len can be written by BH handlers, and read
      from process contexts in a lockless way.
      
      Note the write side should also use WRITE_ONCE() or a variant.
      We need some agreement about the best way to do this.
      
      syzbot reported :
      
      BUG: KCSAN: data-race in tcp_add_backlog / tcp_grow_window.isra.0
      
      write to 0xffff88812665f32c of 4 bytes by interrupt on cpu 1:
       sk_add_backlog include/net/sock.h:934 [inline]
       tcp_add_backlog+0x4a0/0xcc0 net/ipv4/tcp_ipv4.c:1737
       tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
       virtnet_receive drivers/net/virtio_net.c:1323 [inline]
       virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
       napi_poll net/core/dev.c:6352 [inline]
       net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
      
      read to 0xffff88812665f32c of 4 bytes by task 7292 on cpu 0:
       tcp_space include/net/tcp.h:1373 [inline]
       tcp_grow_window.isra.0+0x6b/0x480 net/ipv4/tcp_input.c:413
       tcp_event_data_recv+0x68f/0x990 net/ipv4/tcp_input.c:717
       tcp_rcv_established+0xbfe/0xf50 net/ipv4/tcp_input.c:5618
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
       sk_backlog_rcv include/net/sock.h:945 [inline]
       __release_sock+0x135/0x1e0 net/core/sock.c:2427
       release_sock+0x61/0x160 net/core/sock.c:2943
       tcp_recvmsg+0x63b/0x1a30 net/ipv4/tcp.c:2181
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1864 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 7292 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      70c26558
    • E
      net: silence KCSAN warnings around sk_add_backlog() calls · 8265792b
      Eric Dumazet 提交于
      sk_add_backlog() callers usually read sk->sk_rcvbuf without
      owning the socket lock. This means sk_rcvbuf value can
      be changed by other cpus, and KCSAN complains.
      
      Add READ_ONCE() annotations to document the lockless nature
      of these reads.
      
      Note that writes over sk_rcvbuf should also use WRITE_ONCE(),
      but this will be done in separate patches to ease stable
      backports (if we decide this is relevant for stable trees).
      
      BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg
      
      write to 0xffff88812ab369f8 of 8 bytes by interrupt on cpu 1:
       __sk_add_backlog include/net/sock.h:902 [inline]
       sk_add_backlog include/net/sock.h:933 [inline]
       tcp_add_backlog+0x45a/0xcc0 net/ipv4/tcp_ipv4.c:1737
       tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
       virtnet_receive drivers/net/virtio_net.c:1323 [inline]
       virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
       napi_poll net/core/dev.c:6352 [inline]
       net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
      
      read to 0xffff88812ab369f8 of 8 bytes by task 7271 on cpu 0:
       tcp_recvmsg+0x470/0x1a30 net/ipv4/tcp.c:2047
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1864 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
       ksys_read+0xd5/0x1b0 fs/read_write.c:587
       __do_sys_read fs/read_write.c:597 [inline]
       __se_sys_read fs/read_write.c:595 [inline]
       __x64_sys_read+0x4c/0x60 fs/read_write.c:595
       do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 7271 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      8265792b
    • X
      sctp: add SCTP_SEND_FAILED_EVENT event · b6e6b5f1
      Xin Long 提交于
      This patch is to add a new event SCTP_SEND_FAILED_EVENT described in
      rfc6458#section-6.1.11. It's a update of SCTP_SEND_FAILED event:
      
        struct sctp_sndrcvinfo ssf_info is replaced with
        struct sctp_sndinfo ssfe_info in struct sctp_send_failed_event.
      
      SCTP_SEND_FAILED is being deprecated, but we don't remove it in this
      patch. Both are being processed in sctp_datamsg_destroy() when the
      corresp event flag is set.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      b6e6b5f1
    • X
      sctp: add SCTP_ADDR_MADE_PRIM event · 5cd0b917
      Xin Long 提交于
      sctp_ulpevent_nofity_peer_addr_change() would be called in
      sctp_assoc_set_primary() to send SCTP_ADDR_MADE_PRIM event
      when this transport is set to the primary path of the asoc.
      
      This event is described in rfc6458#section-6.1.2:
      
        SCTP_ADDR_MADE_PRIM:  This address has now been made the primary
           destination address.  This notification is provided whenever an
           address is made primary.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      5cd0b917
    • X
      sctp: add SCTP_ADDR_REMOVED event · c446f50c
      Xin Long 提交于
      sctp_ulpevent_nofity_peer_addr_change() is called in
      sctp_assoc_rm_peer() to send SCTP_ADDR_REMOVED event
      when this transport is removed from the asoc.
      
      This event is described in rfc6458#section-6.1.2:
      
        SCTP_ADDR_REMOVED:  The address is no longer part of the
           association.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      c446f50c
    • X
      sctp: add SCTP_ADDR_ADDED event · 4b774032
      Xin Long 提交于
      A helper sctp_ulpevent_nofity_peer_addr_change() will be extracted
      to make peer_addr_change event and enqueue it, and the helper will
      be called in sctp_assoc_add_peer() to send SCTP_ADDR_ADDED event.
      
      This event is described in rfc6458#section-6.1.2:
      
        SCTP_ADDR_ADDED:  The address is now part of the association.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      4b774032
    • X
      sctp: add chunks to sk_backlog when the newsk sk_socket is not set · 819be810
      Xin Long 提交于
      This patch is to fix a NULL-ptr deref in selinux_socket_connect_helper:
      
        [...] kasan: GPF could be caused by NULL-ptr deref or user memory access
        [...] RIP: 0010:selinux_socket_connect_helper+0x94/0x460
        [...] Call Trace:
        [...]  selinux_sctp_bind_connect+0x16a/0x1d0
        [...]  security_sctp_bind_connect+0x58/0x90
        [...]  sctp_process_asconf+0xa52/0xfd0 [sctp]
        [...]  sctp_sf_do_asconf+0x785/0x980 [sctp]
        [...]  sctp_do_sm+0x175/0x5a0 [sctp]
        [...]  sctp_assoc_bh_rcv+0x285/0x5b0 [sctp]
        [...]  sctp_backlog_rcv+0x482/0x910 [sctp]
        [...]  __release_sock+0x11e/0x310
        [...]  release_sock+0x4f/0x180
        [...]  sctp_accept+0x3f9/0x5a0 [sctp]
        [...]  inet_accept+0xe7/0x720
      
      It was caused by that the 'newsk' sk_socket was not set before going to
      security sctp hook when processing asconf chunk with SCTP_PARAM_ADD_IP
      or SCTP_PARAM_SET_PRIMARY:
      
        inet_accept()->
          sctp_accept():
            lock_sock():
                lock listening 'sk'
                                                do_softirq():
                                                  sctp_rcv():  <-- [1]
                                                      asconf chunk arrives and
                                                      enqueued in 'sk' backlog
            sctp_sock_migrate():
                set asoc's sk to 'newsk'
            release_sock():
                sctp_backlog_rcv():
                  lock 'newsk'
                  sctp_process_asconf()  <-- [2]
                  unlock 'newsk'
          sock_graft():
              set sk_socket  <-- [3]
      
      As it shows, at [1] the asconf chunk would be put into the listening 'sk'
      backlog, as accept() was holding its sock lock. Then at [2] asconf would
      get processed with 'newsk' as asoc's sk had been set to 'newsk'. However,
      'newsk' sk_socket is not set until [3], while selinux_sctp_bind_connect()
      would deref it, then kernel crashed.
      
      Here to fix it by adding the chunk to sk_backlog until newsk sk_socket is
      set when .accept() is done.
      
      Note that sk->sk_socket can be NULL when the sock is closed, so SOCK_DEAD
      flag is also needed to check in sctp_newsk_ready().
      
      Thanks to Ondrej for reviewing the code.
      
      Fixes: d452930f ("selinux: Add SCTP support")
      Reported-by: NYing Xu <yinxu@redhat.com>
      Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      819be810
  21. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  22. 27 9月, 2019 1 次提交
  23. 14 9月, 2019 1 次提交
    • M
      sctp: destroy bucket if failed to bind addr · 29b99f54
      Mao Wenan 提交于
      There is one memory leak bug report:
      BUG: memory leak
      unreferenced object 0xffff8881dc4c5ec0 (size 40):
        comm "syz-executor.0", pid 5673, jiffies 4298198457 (age 27.578s)
        hex dump (first 32 bytes):
          02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00  ................
          f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00  .c=.............
        backtrace:
          [<0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp]
          [<00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp]
          [<000000005be274a2>] sctp_bind+0x5a/0x80 [sctp]
          [<00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6]
          [<00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647
          [<000000004513635b>] __do_sys_bind net/socket.c:1658 [inline]
          [<000000004513635b>] __se_sys_bind net/socket.c:1656 [inline]
          [<000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656
          [<0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
          [<0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This is because in sctp_do_bind, if sctp_get_port_local is to
      create hash bucket successfully, and sctp_add_bind_addr failed
      to bind address, e.g return -ENOMEM, so memory leak found, it
      needs to destroy allocated bucket.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29b99f54