1. 03 4月, 2020 1 次提交
    • V
      net: core: enable SO_BINDTODEVICE for non-root users · c427bfec
      Vincent Bernat 提交于
      Currently, SO_BINDTODEVICE requires CAP_NET_RAW. This change allows a
      non-root user to bind a socket to an interface if it is not already
      bound. This is useful to allow an application to bind itself to a
      specific VRF for outgoing or incoming connections. Currently, an
      application wanting to manage connections through several VRF need to
      be privileged.
      
      Previously, IP_UNICAST_IF and IPV6_UNICAST_IF were added for
      Wine (76e21053 and c4062dfc) specifically for use by
      non-root processes. However, they are restricted to sendmsg() and not
      usable with TCP. Allowing SO_BINDTODEVICE would allow TCP clients to
      get the same privilege. As for TCP servers, outside the VRF use case,
      SO_BINDTODEVICE would only further restrict connections a server could
      accept.
      
      When an application is restricted to a VRF (with `ip vrf exec`), the
      socket is bound to an interface at creation and therefore, a
      non-privileged call to SO_BINDTODEVICE to escape the VRF fails.
      
      When an application bound a socket to SO_BINDTODEVICE and transmit it
      to a non-privileged process through a Unix socket, a tentative to
      change the bound device also fails.
      
      Before:
      
          >>> import socket
          >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
          >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          PermissionError: [Errno 1] Operation not permitted
      
      After:
      
          >>> import socket
          >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
          >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
          >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          PermissionError: [Errno 1] Operation not permitted
      Signed-off-by: NVincent Bernat <vincent@bernat.ch>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c427bfec
  2. 02 4月, 2020 8 次提交
    • M
      mptcp: fix "fn parameter not described" warnings · 564cf2f3
      Matthieu Baerts 提交于
      Obtained with:
      
        $ make W=1 net/mptcp/token.o
        net/mptcp/token.c:53: warning: Function parameter or member 'req' not described in 'mptcp_token_new_request'
        net/mptcp/token.c:98: warning: Function parameter or member 'sk' not described in 'mptcp_token_new_connect'
        net/mptcp/token.c:133: warning: Function parameter or member 'conn' not described in 'mptcp_token_new_accept'
        net/mptcp/token.c:178: warning: Function parameter or member 'token' not described in 'mptcp_token_destroy_request'
        net/mptcp/token.c:191: warning: Function parameter or member 'token' not described in 'mptcp_token_destroy'
      
      Fixes: 79c0949e (mptcp: Add key generation and token tree)
      Fixes: 58b09919 (mptcp: create msk early)
      Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      564cf2f3
    • F
      mptcp: re-check dsn before reading from subflow · de06f573
      Florian Westphal 提交于
      mptcp_subflow_data_available() is commonly called via
      ssk->sk_data_ready(), in this case the mptcp socket lock
      cannot be acquired.
      
      Therefore, while we can safely discard subflow data that
      was already received up to msk->ack_seq, we cannot be sure
      that 'subflow->data_avail' will still be valid at the time
      userspace wants to read the data -- a previous read on a
      different subflow might have carried this data already.
      
      In that (unlikely) event, msk->ack_seq will have been updated
      and will be ahead of the subflow dsn.
      
      We can check for this condition and skip/resync to the expected
      sequence number.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de06f573
    • F
      mptcp: subflow: check parent mptcp socket on subflow state change · 59832e24
      Florian Westphal 提交于
      This is needed at least until proper MPTCP-Level fin/reset
      signalling gets added:
      
      We wake parent when a subflow changes, but we should do this only
      when all subflows have closed, not just one.
      
      Schedule the mptcp worker and tell it to check eof state on all
      subflows.
      
      Only flag mptcp socket as closed and wake userspace processes blocking
      in poll if all subflows have closed.
      Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59832e24
    • F
      mptcp: fix tcp fallback crash · 0b4f33de
      Florian Westphal 提交于
      Christoph Paasch reports following crash:
      
      general protection fault [..]
      CPU: 0 PID: 2874 Comm: syz-executor072 Not tainted 5.6.0-rc5 #62
      RIP: 0010:__pv_queued_spin_lock_slowpath kernel/locking/qspinlock.c:471
      [..]
       queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:50 [inline]
       do_raw_spin_lock include/linux/spinlock.h:181 [inline]
       spin_lock_bh include/linux/spinlock.h:343 [inline]
       __mptcp_flush_join_list+0x44/0xb0 net/mptcp/protocol.c:278
       mptcp_shutdown+0xb3/0x230 net/mptcp/protocol.c:1882
      [..]
      
      Problem is that mptcp_shutdown() socket isn't an mptcp socket,
      its a plain tcp_sk.  Thus, trying to access mptcp_sk specific
      members accesses garbage.
      
      Root cause is that accept() returns a fallback (tcp) socket, not an mptcp
      one.  There is code in getpeername to detect this and override the sockets
      stream_ops.  But this will only run when accept() caller provided a
      sockaddr struct.  "accept(fd, NULL, 0)" will therefore result in
      mptcp stream ops, but with sock->sk pointing at a tcp_sk.
      
      Update the existing fallback handling to detect this as well.
      
      Moreover, mptcp_shutdown did not have fallback handling, and
      mptcp_poll did it too late so add that there as well.
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Tested-by: NChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b4f33de
    • C
      net: ipv6: rpl_iptunnel: remove redundant assignments to variable err · d16fa759
      Colin Ian King 提交于
      The variable err is being initialized with a value that is never
      read and it is being updated later with a new value.  The initialization
      is redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d16fa759
    • K
      net: dsa: dsa_bridge_mtu_normalization() can be static · bf88dc32
      kbuild test robot 提交于
      Fixes: f41071407c85 ("net: dsa: implement auto-normalization of MTU for bridge hardware datapath")
      Signed-off-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf88dc32
    • J
      ipv6: don't auto-add link-local address to lag ports · 744fdc82
      Jarod Wilson 提交于
      Bonding slave and team port devices should not have link-local addresses
      automatically added to them, as it can interfere with openvswitch being
      able to properly add tc ingress.
      
      Basic reproducer, courtesy of Marcelo:
      
      $ ip link add name bond0 type bond
      $ ip link set dev ens2f0np0 master bond0
      $ ip link set dev ens2f1np2 master bond0
      $ ip link set dev bond0 up
      $ ip a s
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
      group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
      mq master bond0 state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
      5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
      mq master bond0 state DOWN group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
      11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
      noqueue state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link
             valid_lft forever preferred_lft forever
      
      (above trimmed to relevant entries, obviously)
      
      $ sysctl net.ipv6.conf.ens2f0np0.addr_gen_mode=0
      net.ipv6.conf.ens2f0np0.addr_gen_mode = 0
      $ sysctl net.ipv6.conf.ens2f1np2.addr_gen_mode=0
      net.ipv6.conf.ens2f1np2.addr_gen_mode = 0
      
      $ ip a l ens2f0np0
      2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
      mq master bond0 state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
             valid_lft forever preferred_lft forever
      $ ip a l ens2f1np2
      5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
      mq master bond0 state DOWN group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
             valid_lft forever preferred_lft forever
      
      Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is
      this a slave interface?" check added by commit c2edacf8, and
      results in an address getting added, while w/the proposed patch added,
      no address gets added. This simply adds the same gating check to another
      code path, and thus should prevent the same devices from erroneously
      obtaining an ipv6 link-local address.
      
      Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
      Reported-by: NMoshe Levi <moshele@mellanox.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Marcelo Ricardo Leitner <mleitner@redhat.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      744fdc82
    • C
      net_sched: add a temporary refcnt for struct tcindex_data · 304e0242
      Cong Wang 提交于
      Although we intentionally use an ordered workqueue for all tc
      filter works, the ordering is not guaranteed by RCU work,
      given that tcf_queue_work() is esstenially a call_rcu().
      
      This problem is demostrated by Thomas:
      
        CPU 0:
          tcf_queue_work()
            tcf_queue_work(&r->rwork, tcindex_destroy_rexts_work);
      
        -> Migration to CPU 1
      
        CPU 1:
           tcf_queue_work(&p->rwork, tcindex_destroy_work);
      
      so the 2nd work could be queued before the 1st one, which leads
      to a free-after-free.
      
      Enforcing this order in RCU work is hard as it requires to change
      RCU code too. Fortunately we can workaround this problem in tcindex
      filter by taking a temporary refcnt, we only refcnt it right before
      we begin to destroy it. This simplifies the code a lot as a full
      refcnt requires much more changes in tcindex_set_parms().
      
      Reported-by: syzbot+46f513c3033d592409d2@syzkaller.appspotmail.com
      Fixes: 3d210534 ("net_sched: fix a race condition in tcindex_destroy()")
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      304e0242
  3. 01 4月, 2020 3 次提交
  4. 31 3月, 2020 18 次提交
  5. 30 3月, 2020 10 次提交