1. 07 7月, 2018 3 次提交
    • J
      tipc: fix correct setting of message type in second discoverer · 92018c7c
      Jon Maloy 提交于
      The duplicate address discovery protocol is not safe against two
      discoverers running in parallel. The one executing first after the
      trial period is over will set the node address and change its own
      message type to DSC_REQ_MSG. The one executing last may find that the
      node address is already set, and never change message type, with the
      result that its links may never be established.
      
      In this commmit we ensure that the message type always is set correctly
      after the trial period is over.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92018c7c
    • J
      tipc: correct discovery message handling during address trial period · e415577f
      Jon Maloy 提交于
      With the duplicate address discovery protocol for tipc nodes addresses
      we introduced a one second trial period before a node is allocated a
      hash number to use as address.
      
      Unfortunately, we miss to handle the case when a regular LINK REQUEST/
      RESPONSE arrives from a cluster node during the trial period. Such
      messages are not ignored as they should be, leading to links setup
      attempts while the node still has no address.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e415577f
    • J
      tipc: fix wrong return value from function tipc_node_try_addr() · 2a57f182
      Jon Maloy 提交于
      The function for checking if there is an node address conflict is
      supposed to return a suggestion for a new address if it finds a
      conflict, and zero otherwise. But in case the peer being checked
      is previously unknown it does instead return a "suggestion" for
      the checked address itself. This results in a DSC_TRIAL_FAIL_MSG
      being sent unecessarily to the peer, and sometimes makes the trial
      period starting over again.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a57f182
  2. 06 7月, 2018 1 次提交
    • T
      ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns · 70ba5b6d
      Tyler Hicks 提交于
      The low and high values of the net.ipv4.ping_group_range sysctl were
      being silently forced to the default disabled state when a write to the
      sysctl contained GIDs that didn't map to the associated user namespace.
      Confusingly, the sysctl's write operation would return success and then
      a subsequent read of the sysctl would indicate that the low and high
      values are the overflowgid.
      
      This patch changes the behavior by clearly returning an error when the
      sysctl write operation receives a GID range that doesn't map to the
      associated user namespace. In such a situation, the previous value of
      the sysctl is preserved and that range will be returned in a subsequent
      read of the sysctl.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70ba5b6d
  3. 05 7月, 2018 3 次提交
    • A
      net: qrtr: Reset the node and port ID of broadcast messages · d27e77a3
      Arun Kumar Neelakantam 提交于
      All the control messages broadcast to remote routers are using
      QRTR_NODE_BCAST instead of using local router NODE ID which cause
      the packets to be dropped on remote router due to invalid NODE ID.
      Signed-off-by: NArun Kumar Neelakantam <aneela@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d27e77a3
    • A
      net: qrtr: Broadcast messages only from control port · fdf5fd39
      Arun Kumar Neelakantam 提交于
      The broadcast node id should only be sent with the control port id.
      Signed-off-by: NArun Kumar Neelakantam <aneela@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdf5fd39
    • P
      ipv6: make ipv6_renew_options() interrupt/kernel safe · a9ba23d4
      Paul Moore 提交于
      At present the ipv6_renew_options_kern() function ends up calling into
      access_ok() which is problematic if done from inside an interrupt as
      access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures
      (x86-64 is affected).  Example warning/backtrace is shown below:
      
       WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90
       ...
       Call Trace:
        <IRQ>
        ipv6_renew_option+0xb2/0xf0
        ipv6_renew_options+0x26a/0x340
        ipv6_renew_options_kern+0x2c/0x40
        calipso_req_setattr+0x72/0xe0
        netlbl_req_setattr+0x126/0x1b0
        selinux_netlbl_inet_conn_request+0x80/0x100
        selinux_inet_conn_request+0x6d/0xb0
        security_inet_conn_request+0x32/0x50
        tcp_conn_request+0x35f/0xe00
        ? __lock_acquire+0x250/0x16c0
        ? selinux_socket_sock_rcv_skb+0x1ae/0x210
        ? tcp_rcv_state_process+0x289/0x106b
        tcp_rcv_state_process+0x289/0x106b
        ? tcp_v6_do_rcv+0x1a7/0x3c0
        tcp_v6_do_rcv+0x1a7/0x3c0
        tcp_v6_rcv+0xc82/0xcf0
        ip6_input_finish+0x10d/0x690
        ip6_input+0x45/0x1e0
        ? ip6_rcv_finish+0x1d0/0x1d0
        ipv6_rcv+0x32b/0x880
        ? ip6_make_skb+0x1e0/0x1e0
        __netif_receive_skb_core+0x6f2/0xdf0
        ? process_backlog+0x85/0x250
        ? process_backlog+0x85/0x250
        ? process_backlog+0xec/0x250
        process_backlog+0xec/0x250
        net_rx_action+0x153/0x480
        __do_softirq+0xd9/0x4f7
        do_softirq_own_stack+0x2a/0x40
        </IRQ>
        ...
      
      While not present in the backtrace, ipv6_renew_option() ends up calling
      access_ok() via the following chain:
      
        access_ok()
        _copy_from_user()
        copy_from_user()
        ipv6_renew_option()
      
      The fix presented in this patch is to perform the userspace copy
      earlier in the call chain such that it is only called when the option
      data is actually coming from userspace; that place is
      do_ipv6_setsockopt().  Not only does this solve the problem seen in
      the backtrace above, it also allows us to simplify the code quite a
      bit by removing ipv6_renew_options_kern() completely.  We also take
      this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option()
      a small amount as well.
      
      This patch is heavily based on a rough patch by Al Viro.  I've taken
      his original patch, converted a kmemdup() call in do_ipv6_setsockopt()
      to a memdup_user() call, made better use of the e_inval jump target in
      the same function, and cleaned up the use ipv6_renew_option() by
      ipv6_renew_options().
      
      CC: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9ba23d4
  4. 04 7月, 2018 3 次提交
  5. 03 7月, 2018 1 次提交
  6. 02 7月, 2018 2 次提交
  7. 01 7月, 2018 1 次提交
    • I
      tcp: prevent bogus FRTO undos with non-SACK flows · 1236f22f
      Ilpo Järvinen 提交于
      If SACK is not enabled and the first cumulative ACK after the RTO
      retransmission covers more than the retransmitted skb, a spurious
      FRTO undo will trigger (assuming FRTO is enabled for that RTO).
      The reason is that any non-retransmitted segment acknowledged will
      set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
      no indication that it would have been delivered for real (the
      scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
      case so the check for that bit won't help like it does with SACK).
      Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
      in tcp_process_loss.
      
      We need to use more strict condition for non-SACK case and check
      that none of the cumulatively ACKed segments were retransmitted
      to prove that progress is due to original transmissions. Only then
      keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
      non-SACK case.
      
      (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
      to better indicate its purpose but to keep this change minimal, it
      will be done in another patch).
      
      Besides burstiness and congestion control violations, this problem
      can result in RTO loop: When the loss recovery is prematurely
      undoed, only new data will be transmitted (if available) and
      the next retransmission can occur only after a new RTO which in case
      of multiple losses (that are not for consecutive packets) requires
      one RTO per loss to recover.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Tested-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1236f22f
  8. 30 6月, 2018 5 次提交
  9. 29 6月, 2018 7 次提交
    • C
      net: handle NULL ->poll gracefully · e88958e6
      Christoph Hellwig 提交于
      The big aio poll revert broke various network protocols that don't
      implement ->poll as a patch in the aio poll serie removed sock_no_poll
      and made the common code handle this case.
      
      Reported-by: syzbot+57727883dbad76db2ef0@syzkaller.appspotmail.com
      Reported-by: syzbot+cdb0d3176b53d35ad454@syzkaller.appspotmail.com
      Reported-by: syzbot+2c7e8f74f8b2571c87e8@syzkaller.appspotmail.com
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Fixes: a11e1d43 ("Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e88958e6
    • S
      net, mm: account sock objects to kmemcg · e699e2c6
      Shakeel Butt 提交于
      Currently the kernel accounts the memory for network traffic through
      mem_cgroup_[un]charge_skmem() interface. However the memory accounted
      only includes the truesize of sk_buff which does not include the size of
      sock objects. In our production environment, with opt-out kmem
      accounting, the sock kmem caches (TCP[v6], UDP[v6], RAW[v6], UNIX) are
      among the top most charged kmem caches and consume a significant amount
      of memory which can not be left as system overhead. So, this patch
      converts the kmem caches of all sock objects to SLAB_ACCOUNT.
      Signed-off-by: NShakeel Butt <shakeelb@google.com>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e699e2c6
    • J
      nl80211: check nla_parse_nested() return values · 95bca62f
      Johannes Berg 提交于
      At the very least we should check the return value if
      nla_parse_nested() is called with a non-NULL policy.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      95bca62f
    • B
      nl80211: relax ht operation checks for mesh · 188f60ab
      Bob Copeland 提交于
      Commit 9757235f, "nl80211: correct checks for
      NL80211_MESHCONF_HT_OPMODE value") relaxed the range for the HT
      operation field in meshconf, while also adding checks requiring
      the non-greenfield and non-ht-sta bits to be set in certain
      circumstances.  The latter bit is actually reserved for mesh BSSes
      according to Table 9-168 in 802.11-2016, so in fact it should not
      be set.
      
      wpa_supplicant sets these bits because the mesh and AP code share
      the same implementation, but authsae does not.  As a result, some
      meshconf updates from authsae which set only the NONHT_MIXED
      protection bits were being rejected.
      
      In order to avoid breaking userspace by changing the rules again,
      simply accept the values with or without the bits set, and mask
      off the reserved bit to match the spec.
      
      While in here, update the 802.11-2012 reference to 802.11-2016.
      
      Fixes: 9757235f ("nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value")
      Cc: Masashi Honma <masashi.honma@gmail.com>
      Signed-off-by: NBob Copeland <bobcopeland@fb.com>
      Reviewed-by: NMasashi Honma <masashi.honma@gmail.com>
      Reviewed-by: NMasashi Honma <masashi.honma@gmail.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      188f60ab
    • D
      mac80211: disable BHs/preemption in ieee80211_tx_control_port() · e7441c92
      Denis Kenzior 提交于
      On pre-emption enabled kernels the following print was being seen due to
      missing local_bh_disable/local_bh_enable calls.  mac80211 assumes that
      pre-emption is disabled in the data path.
      
          BUG: using smp_processor_id() in preemptible [00000000] code: iwd/517
          caller is __ieee80211_subif_start_xmit+0x144/0x210 [mac80211]
          [...]
          Call Trace:
          dump_stack+0x5c/0x80
          check_preemption_disabled.cold.0+0x46/0x51
          __ieee80211_subif_start_xmit+0x144/0x210 [mac80211]
      
      Fixes: 91180649 ("mac80211: Add support for tx_control_port")
      Signed-off-by: NDenis Kenzior <denkenz@gmail.com>
      [commit message rewrite, fixes tag]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      e7441c92
    • D
      bpf: Change bpf_fib_lookup to return lookup status · 4c79579b
      David Ahern 提交于
      For ACLs implemented using either FIB rules or FIB entries, the BPF
      program needs the FIB lookup status to be able to drop the packet.
      Since the bpf_fib_lookup API has not reached a released kernel yet,
      change the return code to contain an encoding of the FIB lookup
      result and return the nexthop device index in the params struct.
      
      In addition, inform the BPF program of any post FIB lookup reason as
      to why the packet needs to go up the stack.
      
      The fib result for unicast routes must have an egress device, so remove
      the check that it is non-NULL.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4c79579b
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  10. 28 6月, 2018 5 次提交
  11. 27 6月, 2018 3 次提交
  12. 26 6月, 2018 2 次提交
  13. 23 6月, 2018 4 次提交
    • C
      net_sched: remove a bogus warning in hfsc · 35b42da6
      Cong Wang 提交于
      In update_vf():
      
        cftree_remove(cl);
        update_cfmin(cl->cl_parent);
      
      the cl_cfmin of cl->cl_parent is intentionally updated to 0
      when that parent only has one child. And if this parent is
      root qdisc, we could end up, in hfsc_schedule_watchdog(),
      that we can't decide the next schedule time for qdisc watchdog.
      But it seems safe that we can just skip it, as this watchdog is
      not always scheduled anyway.
      
      Thanks to Marco for testing all the cases, nothing is broken.
      Reported-by: NMarco Berizzi <pupilla@libero.it>
      Tested-by: NMarco Berizzi <pupilla@libero.it>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35b42da6
    • E
      net: dccp: switch rx_tstamp_last_feedback to monotonic clock · 0ce4e70f
      Eric Dumazet 提交于
      To compute delays, better not use time of the day which can
      be changed by admins or malicious programs.
      
      Also change ccid3_first_li() to use s64 type for delta variable
      to avoid potential overflows.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Cc: dccp@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ce4e70f
    • E
      net: dccp: avoid crash in ccid3_hc_rx_send_feedback() · 74174fe5
      Eric Dumazet 提交于
      On fast hosts or malicious bots, we trigger a DCCP_BUG() which
      seems excessive.
      
      syzbot reported :
      
      BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback()
      CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline]
       ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793
       ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline]
       dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180
       dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378
       dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654
       sk_backlog_rcv include/net/sock.h:914 [inline]
       __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517
       dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875
       ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215
       NF_HOOK include/linux/netfilter.h:287 [inline]
       ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256
       dst_input include/net/dst.h:450 [inline]
       ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396
       NF_HOOK include/linux/netfilter.h:287 [inline]
       ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492
       __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
       process_backlog+0x219/0x760 net/core/dev.c:5373
       napi_poll net/core/dev.c:5771 [inline]
       net_rx_action+0x7da/0x1980 net/core/dev.c:5837
       __do_softirq+0x2e8/0xb17 kernel/softirq.c:284
       run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
       smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
       kthread+0x345/0x410 kernel/kthread.c:240
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Cc: dccp@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74174fe5
    • H
      ipv6: mcast: fix unsolicited report interval after receiving querys · 6c6da928
      Hangbin Liu 提交于
      After recieving MLD querys, we update idev->mc_maxdelay with max_delay
      from query header. This make the later unsolicited reports have the same
      interval with mc_maxdelay, which means we may send unsolicited reports with
      long interval time instead of default configured interval time.
      
      Also as we will not call ipv6_mc_reset() after device up. This issue will
      be there even after leave the group and join other groups.
      
      Fixes: fc4eba58 ("ipv6: make unsolicited report intervals configurable for mld")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c6da928